Skip to main content

Full text of "Opera Magistris (Elements of Applied Mathematics)"

See other formats

arth 5 distrib 


kS5» '' . '’//A 

6,000 pages super-quick, super-painless undergraduate transportable Book 
nit ary 'Applied Mathematics for Engineers (EAME) 




r j 

f J 

miw'jr *1 

F j 

... ; 

■ > V| 



Earth 6 

Opera Magistris 

3rd Edition 

Compendium on 

Elementary Applied Mathematics for Engineers 



Supervisors : 


F.D.C. Tigrou 

Vincent ISOZ 




Revision History 



Author(s) Description 

VI French 3rd Edition translation into English (6,000 

first pages freely available!). Translation progress 

~ 96% 



1 Warnings 


2 Acknowledgements 


3 Introduction 


4 Arithmetic 


5 Algebra 


6 Analysis 


7 Geometry 


8 Mechanics 


9 Electromagnetism 


10 Atomistic 


11 Cosmology 


12 Chemistry 


13 Theoretical Computing 


14 Social Sciences 


15 Engineering 


16 Epilogue 


17 Biographies 


18 Chronology 



19 Humour 


20 Links 




22 Change Log 


23 Nomenclature 


List of Figures 


List of Tables 


List of Algorithms 






24 Donate 



Table of Contents 

1 Warnings 2 

1 Impressum 3 

1.1 Use of content 3 

1.2 How to use this book 4 

1.2.1 Ancilliaries 4 

1 .3 Data Protection 7 

1 .4 Use of data 7 

1 .5 Data transmission 7 

1.6 Agreement 7 

1.7 Errata 7 

2 License 8 

2.1 Preamble 8 

2.2 Applicability and Definitions 9 

2.3 Verbatim Copying 10 

2.4 Copying in Quantity 10 

2.5 Modifications 11 

2.6 Combining Documents 12 

2.7 Collections of Documents 13 

2.8 Aggregation with independant Works 13 

2.9 Translation 13 

2.10 Termination 13 

2.11 Future revisions of this License 14 

3 Roadmap 15 

2 Acknowledgements 21 

3 Introduction 23 

1 Forewords 24 

2 Methods 31 

2.1 Descartes’ Method 34 

2.2 Archimedean Oath 35 

2.3 Scientific Publication Rules (SPR) 36 

2.4 Scientific Mainstream Media communication 38 

3 Vocabulary 39 

3.1 On Sciences 40 

3.2 Terminology 43 

4 Science and Faith 46 

4.0.1 Baloney detection kit 48 


4 Arithmetic 52 

1 Proof Theory 54 

1.0.1 Foundations Crisis 55 

1.1 Paradoxes 59 

1.1.1 Hypothetical-Deductive Reasoning 61 

1 .2 Propositional Calculus 62 

1.2.1 Propositions (premises) 63 

1.2.2 Connectors 65 

1.2.3 Decision procedures 71 

1.2. 3.1 Non- axiomatic procedural decisions 72 

1.2. 3. 2 Axiomatic procedural decisions 72 

1.2.4 Quantifiers 77 

1.3 Predicate Calculus 79 

1.3.1 Grammar 79 

1.3.2 Language 80 

1.3. 2.1 Symbols 80 

1.3. 2. 2 Terms 81 

1.3. 2. 3 Formulas 83 

1.4 Proofs 86 

1.4.1 Rules of Proofs 88 

2 Numbers 97 

2.1 Digital Bases 100 

2.2 Type of Numbers 103 

2.2.1 Natural Integer Numbers 103 

2.2. 1.1 Peano axioms 105 

2. 2. 1.2 Odd, Even and Perfect Numbers 106 

2. 2. 1.3 Prime Numbers 106 

2.2.2 Relative Integer Numbers 108 

2.2.3 Rational Numbers 109 

2.2.4 Irrational Numbers 113 

2.2.5 Real Numbers 115 

2.2.6 Trans finite Numbers 117 

2.2.7 Complex Numbers 120 

2.2.7. 1 Geometric Interpretation of Complex Numbers . . . 125 

2.2.7. 1.1 Fresnel Vectors (phasors) 129 

2. 2.7. 2 Transformation in the plane 130 

2.2.8 Quaternion Numbers 135 

2.2.8. 1 Matrix Interpretation of Quaternions 141 

2. 2. 8. 2 Rotations with Quaternions 142 

2.2.9 Algebraic and Transcendental Numbers 148 

2.2.10 Universe Numbers (normal numbers) 151 

2.2.11 Abstract Numbers (variables) 152 Domain of a Variable 153 

3 Arithmetic Operators 156 

3.1 Binary Relations 156 

3.1.1 Equalities 157 

3.1.2 Comparators 158 

3.2 Fundamental Arithmetic Laws 167 


3.2.1 Addition 167 

3.2.2 Subtraction 171 

3.2.3 Multiplication 174 

3.2.4 Division 177 

3.2.4. 1 n-root 181 

3.3 Arithmetic Polynomials 183 

3.4 Absolute Value 184 

3.5 Calculation Rules (operators priorities) 187 

4 Number Theory 192 

4.1 Principle of good order 192 

4.2 Induction Principle 193 

4.3 Divisibility 195 

4.3.1 Euclidean Division 197 

4.3. 1.1 Greatest common divisor 199 

4.3.2 Euclidean Algorithm 202 

4.3.3 Least Common Multiple 206 

4.3.4 Fundamental Theorem of Arithmetic 210 

4.3.5 Congruences (modular arithmetic) 211 

4.3.5. 1 Congruence Class 214 

4. 3. 5. 2 Complete set of residues 217 

4. 3. 5. 3 Chinese remainder theorem 218 

4.3.6 Continued fraction 222 

5 Set Theory 231 

5.1 Zermelo-Fraenkel Axiomatic 235 

5.1.1 Cardinals 240 

5.1.2 Cartesian Product 243 

5.1.3 Intervals 244 

5.2 Set Operations 245 

5.2.1 Inclusion 246 

5.2.2 Intersection 246 

5.2.3 Union 248 

5.2.4 Difference 249 

5.2.5 Symmetric Difference 250 

5.2.6 Product 251 

5.2.7 Complementarity 251 

5.3 Functions and Applications 253 

5.3.1 Cantor-Bemstein Theorem 260 

6 Probabilities 269 

6.1 Event Universe 270 

6.2 Kolmogorov’s Axioms 271 

6.3 Conditional Probabilities 277 

6.3.1 Conditional Expectation 283 

6.3.2 Bayesian Networks 285 

6.4 Martingales 297 

6.5 Combinatorial Analysis 299 

6.5.1 Simple Arrangements with Repetition 300 

6.5.2 Simple Permutations without Repetitions 301 

6.5.3 Simple Permutations with Repetitions 302 


6.5.4 Simple Arrangements with Repetitions 303 

6.5.5 Simple Combinations without Repetitions 304 

6.5.6 Simple Combinations with Repetitions 306 

6.6 Markov Chains 308 

7 Statistics 313 

7.1 Samples 315 

7.2 Averages 316 

7.2.1 Laplace Smoothing 333 

7.2.2 Means and Averages properties 334 

7.3 Type of variables 339 

7.3.1 Discrete Variables and Moments 340 

7. 3. 1.1 Mean and Deviation of Discrete Random Variables . 341 

7. 3. 1.2 Discrete Covariance 351 

7. 3. 1.2.1 Anscombe’s famous quartet 356 

7. 3. 1.3 Mean and Variance of the Average 358 

7. 3. 1.4 Coefficient of Correlation 360 

7.3.2 Continuous Variables and Moments 365 

7.4 Fundamental postulate of statistics 366 

7.5 Diversity Index 367 

7.6 Distribution Functions (probabilities laws) 369 

7.6.1 Discrete Uniform Distribution 371 

7.6.2 Bernoulli Distribution 374 

7.6.3 Geometric Distribution 375 

7.6.4 Binomial Distribution 379 

7.6.5 Negative Binomial Distribution 386 

7.6.6 Hypergeometric Distribution 391 

7.6.7 Multinomial Distribution 397 

7.6.8 Poisson Distribution 401 

7.6.9 Normal & Gauss-Laplace Distribution 404 

7.6.9. 1 Sum of two random Normal variables 412 

7. 6. 9. 2 Product of two random Normal variables 414 

7. 6. 9. 3 Bivariate Normal Distribution 416 

7. 6. 9.4 Normal Reduced Centered Distribution 421 

7. 6. 9. 5 Henry’s Line 422 

1 . 6 . 9.6 Q-Q plot 426 

7.6.10 Log-Normal Distribution 428 

7.6.11 Continuous Uniform Distribution 431 

7.6.12 Triangular Distribution 435 

7.6.13 Pareto Distribution 437 

7.6.14 Exponential Distribution 443 

7.6.15 Cauchy Distribution 445 

7.6.16 Beta Distribution 448 

7.6.17 Gamma Distribution 451 

7.6.18 Generalized Gamma Distribution 456 

7.6.19 Chi-Square (Pearson) Distribution 457 

7.6.20 Student Distribution 461 

7.6.21 Fisher Distribution 465 

7.6.22 General Folded Normal Distribution 467 

x Half-normal distribution 469 

7.6.23 Benford Distribution 473 

7.7 Likelihood Estimators 476 

7.7.1 Normal Distribution MLE 478 

7.7.2 Poisson Distribution MLE 482 

7.7.3 Binomial (and Geometric) Distribution MLE 483 

7.7.4 Weibull Distribution MLE 484 

7.7.5 Gamma Distribution MLE 487 

7.8 Finite Population Correction Factor 488 

7.9 Confidence Intervals 491 

7.9.1 C.I. on the Mean with known Variance 492 

7.9.2 C.I. on the Variance with known Mean 497 

7.9.3 C.I. on the Variance with empirical Mean 502 

7.9.4 C.I. on the Mean with known unbiased Variance 504 

7.9.5 Binomial exact Test 507 

7.9.6 C.I. for a Proportion 510 

7.9.6. 1 Test of equality of two Proportions 514 

7.9.7 Sign Test 516 

7.9.8 Mood’s Median Test 518 

7.9.9 Poisson Test (1 sample) 520 

7.9.10 Poisson Test (2 samples) 523 

7.9.11 Confidence/Tolerance/Prediction Interval 526 

7.10 Weak Law of Large Numbers 528 

7.11 Characteristic Function 532 

7.12 Central Limit Theorem 536 

7.13 Univariate Hypothesis and Adequation tests 542 

7.13.1 Direction of hypothesis test and p - values 545 

7.13.2 Fisher’s method for multiple p-values 552 Simpson’s Paradox (sophism) 555 

7.13.3 Power of a test 557 

7.13.4 Power of the one sample Z-test 559 

7.13.5 Power of the one and two samples P-test 561 

7.13.6 Fieller’s test (ratio of two means) 563 

7.13.7 Analysis Of VAriance (ANOVA) 568 Analysis of Variance with one fixed factor 569 Contrasts 581 Analysis of Variance with two fixed factors without 

repetitions 584 Analysis of Variance with two fixed factors with rep- 
etitions 600 Multifactor ANOVA with Repeated measures .... 606 Latin Square ANOVA 606 Greaco-Latin Square ANOVA 610 Multivariate Analysis of Variance (MANOVA) . . . 617 

7.13.8 Equivalence tests 622 

7.13.9 Cochran C test 624 

7.13.10 Adequation Tests (goodness of fit tests) 626 Pearson’s chi-squared GoF test 626 

xr Kolmogorov-Smirnov GoF test 632 Ryan-Joiner GoF test 638 Anderson-Darling GoF test 642 

7.13.11 Likelihood-ratio tests 652 

7.14 Robustness 654 

7.14.1 Rank Statistics 656 L-Statistics 656 Ranks Distribution Law 657 Wilcoxon Rank Sum Test 658 Mann- Witheny Rank Sum Test 667 Treatment of equalities 673 One sample Wilcoxon rank sum signed test 675 Wilcoxon rank sum signed test for two paired samples679 Kruskal-Wallis test 681 Friedman Test 685 Spearman Rank Correlation Coefficient 688 

7.14.2 Range Statistics 691 Tukey’s Range Test 694 

7.14.3 Extreme Value Theory 698 

7.15 Multivariate Statistics 699 

7.15.1 Principal Component Analysis 700 SVD and PCA 716 

7.15.2 Correspondence Factorial Analysis (AFC) 718 

7.15.3 Chi-2 Test of Independence 723 

7.15.4 Cramer’s V 727 

7.15.5 Exact Fisher Test 731 

7.15.6 Cohen’s kappa agreement 736 

7.15.7 McNemar’s test 739 

7.16 Survival Statistics 742 

7.16.1 Kaplan-Meier Survival Rate 743 

7.16.2 Cochran-Mantel-Haenszel tests 748 

7.17 Propagation of Errors (experimental uncertainty analysis) 757 

7.17.1 Absolute and Relative Uncertainties (Direct calculation of bias) 757 

7.17.2 Statistical Errors 758 

7.17.3 Repeatability 760 

7.17.4 Error propagation (linearized approximation) 761 

7.17.5 Significant Numbers 762 

7.18 A World without statistics 764 

5 Algebra 767 

1 Calculus 769 

1.1 Equations and Inequations 770 

1.1.1 Equations 771 

1.1.2 Inequations 775 

1.2 Remarkable Identities 779 

1.3 Polynomials 783 

1.3.1 Euclidean Division of Polynomials 787 

1.3.2 Factorization Theorem of Polynomials 789 


1.3.3 Diophantine equation 790 

1.3.4 First order univariate Polynomial and Equations 791 

1.3.5 Second order univariate Polynomial and Equations 792 

1.3. 5.1 Irrational Equations 796 

1.3. 5. 2 Gold Number 798 

1.3.6 Third order univariate Polynomial and Equations 798 

1.3.7 Fourth order univariate Polynomial and Equations 801 

1.3.8 Trigonometric Polynomials 804 

1.3.9 Cyclotomic Polynomials 804 

1.3.10 Legendre Polynomials 807 

2 Set Algebra 812 

2.1 Groups Algebra and Geometry 812 

2.1.1 Cyclic Groups 813 

2.1.2 Transformations Groups 816 

2.1.3 Group of Symetries 824 

2. 1.3.1 Orbits and Stabilizers 828 

2.1.4 Permutations Groups 829 

2.2 Galois Theory 839 

2.2.1 Elementary symmetric and Invariant Polynomials 839 

2.2.2 General Vieta’s formulas 843 

3 Differential and Integral Calculus 844 

3.1 Differential Calculus 844 

3.1.1 Differentials 854 

3.1.2 Usual Derivatives 861 

3.1.3 Implicit Differentiation 874 

3.1.4 Smoothness 880 

3.2 Integral Calculus 881 

3.2.1 Definite Integral 881 

3.2.2 Indefinite Integral 889 

3.2.3 Double Integral 896 

3.2.3. 1 Fubini’s theorem 898 

3.2.4 Integration by Substitution 899 

3.2.4. 1 Jacobian 901 

3.2.5 Integration by Parts 906 

3.2.6 Usual Primitives 908 

3.2.7 Integral representation of first kind Bessel’s function 929 

3.2.8 Dirac Function 933 

3.2.9 Gamma Euler Function 935 

3.2.9. 1 Euler-Mascheroni Constant 939 

3.2.10 Curvilinear Integrals 941 Curvilinear Integral of a scalar field 941 Curvilinear Integral of a vector field 942 

3.2.11 Integrals involving parametric equations 945 

3.3 Differential Equations 948 

3.3.1 First order Differential Equations 949 

3.3.2 Linear Differential Equations 949 

3.3.3 Resolution Methods of Differential Equations 951 

3.3.3. 1 Method of characteristic polynomial 952 


3.3.3. 1.1 Resolution of the H.E. of the first order 

L.D.E. with constant coefficients 952 

3. 3. 3. 1.2 Resolution of the H.E. of the first order 
L.D.E. with non-constant coefficients .... 953 

3. 3. 3. 1.3 Resolution of the H.E. of the second order 

L.D.E. with constant coefficients 954 

3. 3. 3. 2 Integrating Factor Method (Euler’s Method) .... 958 

3. 3. 3. 3 Method of separation of variables 961 

3. 3. 3.4 Method of constant variation 962 

3.3.4 Classification of partial differential equations 965 

3.4 Systems of Differential Equations 969 

3.5 Regular Methods of Perturbations 974 

3.5.1 Perturbation theory for algebraic equations 974 

3.5.2 Perturbation theory of differential equations 976 

4 Sequences and Series 980 

4.1 Sequences 980 

4.1.1 Arithmetic Sequences 981 

4.1.2 Harmonic Sequences 984 

4.1.3 Geometric Sequences 985 

4.1.4 Cauchy Sequence 986 

4.1.5 Fibonacci Sequence 990 

4.1.6 Logic Sequences/Psychologist Sequences 991 

4.2 Series 992 

4.2.1 Gauss Series 993 

4.2. 1.1 Bernoulli’s Numbers and Polynomials 996 

4.2.2 Arithmetic Series 1001 

4.2.3 Geometric Series 1002 

4.2.3. 1 Zeta function and Euler’s identity 1003 

4.2.4 Telescoping Series 1007 

4.2.5 Grandi’s Series 1008 

4.2.6 Taylor and Maclaurin Series 1011 

4.2.6. 1 Usual Maclaurin developments 1016 

4. 2. 6. 2 Taylor series of bivariate functions (multivariate 

Taylor series) 1024 

4. 2. 6. 3 Quadratic Form 1025 

4. 2. 6.4 Lagrange Remainder 1028 

4. 2. 6. 5 Taylor Series with Integral Remainder 1030 

4.2.7 Fourier Series (trigonometric series) 1032 

4.2.7. 1 Power of a signal 1051 

4. 2. 7. 2 Fourier Transform 1053 

4.2.8 Bessel Series 1062 

4.2.8. 1 Zero order Bessel’s Functions 1062 

4. 2. 8. 2 n order Bessel’s Functions 1062 

4. 2. 8. 3 Bessel’s Differential Equations of order n 1068 

4.3 Convergence Criteria 1069 

4.3.1 Integral Test 1069 

4.3.2 D’Alembert Rule 1070 

4.3.3 Alternating Series Test 1073 


4.3.4 Fixed Point Theorem 1073 

4.4 Generating Functions (transformation of a sequence into a series) . . . 1075 

4.4.1 Ordinary Generating Functions (transformation of a sequence 

into a series) 1075 

4. 4. 1.1 Composition of Generating functions 1078 

4.4.2 Multivariate Generating Functions 1080 

4.4.3 Functional Generating Functions 1080 

5 Vector Calculus 1083 

5.1 Concept of Arrow 1085 

5.2 Set of Vectors 1086 

5.2.1 Pseudo- Vectors 1088 

5.2.2 Multiplication by a scalar 1090 

5.2.2. 1 Rule of three 1090 

5.3 Vector Spaces 1092 

5.3.1 Linear Combinations 1093 

5.3.2 Sub- vector spaces 1093 

5.3.3 Generating families 1094 

5.3.4 Linear Dependance or Independance 1094 

5.3.5 Base of a vectorial space 1095 

5.3.6 Direction Angles 1097 

5.3.7 Dimensions of a vector space 1098 

5.3.8 Extension of a free family 1098 

5.3.9 Rank of a finite family 1099 

5.3.10 Direct Sums 1100 

5.3.11 Affine spaces 1101 

5.4 Euclidean Vectore Spaces 1104 

5.4.1 Scalar Product (Dot Product) 1106 

5. 4. 1.1 Cauchy-Schwarz inequality 1111 

5. 4. 1.2 Triangular Inequalities 1112 

5. 4. 1.3 General Scalar/Dot Product 1113 

5.4.2 Cross Product 1114 

5.4.3 Mixed Product (triple product) 1121 

5.5 Vectorial Functional Space 1122 

5.6 Hermitian Vector Space 1125 

5.6.1 Hermitian Inner Product 1126 

5.6.2 Types of Vectors Spaces 1127 

5.7 System of Coordinates 1128 

5.7.1 Cartesian (rectangular) Coordinate System 1128 

5.7.2 Spherical Coordinate System 1130 

5.7.3 Cylindrical Coordinate System 1135 

5.7.4 Polar Coordinate System 1137 

5.8 Differential Operators 1139 

5.8.1 Gradients of Scalar Field 1 141 

5.8.2 Gradients of Vector Field 1149 

5.8.3 Divergences of a Vector Field 1150 

5.8.4 Rotationals of a Vector Field (Curl) 1158 

5.8.5 Faplacians of Scalar Field (Faplace Operator) 1168 

5.8.6 Faplacians of a Vector Field 1172 


5.8.7 Remarkable Identities 1179 

5.8.8 Summary 1182 

6 Linear Algebra 1186 

6.0.1 Linear Systems 1190 

6.1 Linear Transformations 1194 

6.2 Matrices 1196 

6.2. 1 Rank of a matrix 1197 

6.2.2 Matrix Algebra 1202 

6.2.3 Type of Matrices 1205 

6.2.4 Determinant 1213 

6.2.4. 1 Derivative of a Determinant 1225 

6. 2.4. 2 Determinant Cofactor and Matrix Inverse 1226 

6.3 Change of Basis (frames) 1228 

6.4 Eigenvalues and Eigenvectors 1231 

6.4.1 Rotation Matrices and Eigenvalues 1234 

6.5 Spectral Theorem 1237 

6.6 Singular Value Decomposition (SVD) 1242 

6.6.1 Singular Vectors 1243 

7 Tensor Calculus 1251 

7.1 Tensor 1252 

7.2 Indicial Notation 1254 

7.2.1 Summation on multiple index 1255 

7.2.2 Kronecker Symbol 1255 

7.2.3 Antisymmetric Symbol (Levi-Civita symbol) 1256 

7.3 Metric and Signature 1262 

7.4 Gram’s Determinant 1264 

7.5 Contravariant and Covariant Components 1270 

7.6 Operation in Basis 1272 

7.6.1 Gram-Schmidt Orthogonalization Method 1272 

7.6.2 Change of Basis 1273 

7.6.3 Reciprocal Basis (Dual Basis) 1274 

7.7 Euclidean Tensors (cartesian tensor) 1279 

7.7.1 Fundamental Tensor 1279 

7.7.2 Tensor product (dyadic) of two vectors and matrices 1280 

7.7.3 Tensor Spaces 1283 

7.7.4 Linear combination of tensors 1287 

7.7.5 Contraction of indices 1287 

7.7.5. 1 Raising and lowering indices 1288 

7.8 Special Tensors 1292 

7.8.1 Symmetric Tensor 1292 

7.8.2 Antisymmetric Tensor 1295 

7.8.3 Fundamental Tensor 1298 

7.9 Curvilinear Coordinates 1299 

8 Spinor Calculus 1332 

8.1 Unit Spinor 1333 

8.2 Geometric Properties 1339 

8.2.1 Plane Symmetries 1339 

8.2.2 Rotations 1342 


8.2.3 Properties of Pauli Matrices 


6 Analysis 1354 

1 Functional Analysis 1355 

1.1 Representations 1356 

1.1.1 Tabular Representation 1356 

1.1.2 Graphic al Representation 1357 

1.1. 2.1 2D representations 1358 

1.1. 2. 2 3D representations 1362 

1.1. 2. 3 2D Vector representations 1368 

1 . 1 .2.4 Properties of visual representations 1371 

1.1.3 Analytical Representation 1376 

1.2 Functions 1379 

1.2.1 Limits and Continuity of Functions 1391 

1.2. 1.1 Limit laws 1398 

1.2.2 Asymptotes 1399 

1.3 Logarithms 1405 

1.4 Transformations 1413 

1.4.1 Fourier Transform 1413 

1.4.2 Laplace Transform 1413 

1.4.3 Hilbert Transform 1414 

1.5 Functional dot product (inner product) 1414 

2 Complex Analysis 1419 

2.1 Linear Applications 1419 

2.2 Holomorphic Functions 1429 

2.2.1 Orthogonality of real and imaginary iso-curves 1435 

2.3 Complex Logarithm 1437 

2.4 Complex Integral Calculus 1440 

2.4.1 Convergence of a complex series 1447 

2.5 Path Decomposition 1456 

2.5.1 Inverse Path 1457 

2.6 Laurent Series 1459 

2.7 Singularities 1467 

2.8 Residue Theorem 1470 

2.8.1 Pole at infinity 1474 

3 Topology 1476 

3.1 General Topology 1477 

3.1.1 Topological Spaces 1478 

3.2 Metric Space and Distance 1479 

3.2.1 Equivalent Distances 1484 

3.2.2 Lipschitz Functions 1484 

3.2.3 Continuity and Uniform Continuity 1487 

3.3 Opened and Closed Set 1489 

3.3.1 Balls 1490 

3.3.2 Partititions 1494 

3.3.3 Formal Ball 1496 

3.3.4 Diameter 1497 

3.4 Varieties 1499 


3.4.1 Surfaces Homeomorphism 1500 

3.4.2 Differential Varieties 1504 

4 Measure Theory 1506 

4.1 Measurable Spaces 1506 

4.1.1 Monotone Classes 1514 

7 Geometry 1517 

1 Trigonometry 1519 

1.1 Radian 1519 

1.2 Circle Trigonometry 1521 

1.2.1 Remarkable trigonometric triangle identities 1530 

1.2. 1.1 Laws of Cosines 1534 

1.2. 1.2 Laws of Sines 1536 

1.3 Hyperbolic Trigonometry 1537 

1.3.1 Remarkable hyperbolic identities 1544 

1.4 Spherical Trigonometry 1546 

1.5 Solide Angle 1551 

2 Euclidean Geometry 1554 

2.1 Objects of Euclidean Geometry 1554 

2.1.1 Dimensions 1556 

2.2 Euclid Constructions 1564 

2.2.1 Segments and Lines 1565 

2. 2. 1.1 Quantities of the same type 1566 

2.3 Plane Geometry 1571 

2.3.1 Displacements and Turnarounds 1572 

2.3.2 Plane angles 1573 

2.3.2. 1 Angle Measurements 1580 

2. 3. 2. 2 Units of Angle Measurements 1583 

2. 3. 2. 3 Bisector 1586 

2.3.3 Triangles 1588 

2.3.3. 1 Equal Triangles (congruent triangles) 1589 

2. 3. 3. 2 Isosceles Triangles 1592 

2. 3. 3. 3 Equilateral Triangles 1596 

2. 3. 3.4 Right Triangle 1596 

2. 3. 3. 5 Right Isosceles Triangle 1597 

2. 3. 3. 6 Inequalities in the triangles 1599 

2. 3. 3.7 Triangles remarkable interior lines 1601 

2. 3. 3. 8 Pythagorean theorem 1602 

2. 3. 3. 9 Thales’ Theorem (intercept theorem) 1603 

2.3.4 Parallelism 1608 

2.3.5 Circle 1610 

2.3.5. 1 Circumscribed circle theorem 1613 

2. 3. 5. 2 Inscribed circle theorem 1614 

2. 3. 5. 3 Thales’ theorem of the circle 1616 

2. 3. 5.4 Central angle theorem 1617 

2.4 Hilbert’s Axioms 1621 

2.4.1 Incidence Axioms (axioms of association) 1622 

2.4.2 Order Axioms 1622 


2.4.3 Congruence Axioms 1623 

2.4.4 Continuity Axioms 1623 

2.4.5 Parellels Axioms 1624 

2.5 B ary center (centroid) 1625 

2.6 Geometric Transformations 1630 

2.6.1 Translation 1632 

2.6.2 Homothety (scaling) 1633 

2.6.3 Shear (skew) transformation 1635 

2.6.4 Rotation 1637 

2.6.4. 1 Gimbal lock 1642 

2. 6.4. 2 Euler angles 1646 

2.6.5 Reflection 1648 

3 Non-Euclidean Geometry 1651 

3.1 Axioms of non-euclidean geometry 1653 

3.2 Geodesic and Metric Equation 1654 

3.3 Riemann Spaces 1658 

4 Projective Geometry 1664 

4.1 Conical Perspective (Central Perspective) 1665 

4.1.1 Images of Points 1668 

4.1.2 Images of Straight Lines 1684 

4.2 Affine projections 1691 

4.2.1 Isometric perspective 1693 

4.2.2 Oblique perspective 1699 

4.2.3 Orthogonal projection 1700 

4.2.4 Spherical projection 1705 

4.2.4. 1 Stereographic projection 1706 

4. 2.4. 2 Cylindrical projection 1713 

4. 2.4. 3 Mercator projection 1714 

4.2.5 Other perspectives 1717 

4.3 Homogeneous Coordinates (projection coordinates) 1720 

4.3.1 V 2 Projective Space 1721 

4.3.2 V 3 Projective Space 1723 

5 Analytical Geometry 1726 

5.1 Conics 1726 

5.1.1 Algebraic approach 1727 

5.1.2 Geometric Approach 1736 

5.1.3 Dudelin Theorem (Dudelin Spheres) 1749 

5.1.4 Classification of conical by the determinant 1750 

5.2 Parametrizations 1755 

5.2.1 Equation of the Plane 1755 

5.2.2 Equation of the Straight line 1759 

5.2.2. 1 Distance from a line to a point 1761 

5. 2. 2. 2 Line defined by the intersection of planes 1762 

5. 2. 2. 3 Parametric equation of a line in M 3 1762 

5.2.3 Equation of a Square 1764 

5.2.4 Equation of a Cycloid 1765 

5.2.5 Equation of an Epicycloid 1766 

5.2.6 Equation of an Hypocycloid 1768 


5.2.7 Surface of revolution 1770 

5.2.7. 1 Cone 1770 

5. 2.7.2 Sphere 1772 

5.2.73 Ellipsoid (spheroid, geoid) 1774 

5. 2.7.4 Cylinder 1777 

5. 2.7.5 Paraboloid 1778 

5. 2. 7. 6 Hyperboloid 1780 

5. 2.7.7 Torus 1785 

6 Differential Geometry 1787 

6.1 Parametric Curves 1787 

6.2 Isolines 1793 

6.3 Frenet Frame 1799 

6.4 Surface Patchs 1810 

6.4. 1 Metric of a Surface Patch 1811 

6. 4. 1.1 Regularity of a Surface 1813 

7 Geometric Shapes 1817 

7.1 Known Surfaces (Areas) 1818 

7.1.1 Polygons 1818 

7.1.2 Rectangle 1823 

7.1.3 Square 1824 

7.1.4 Unspecified Triangle 1826 

7.1.5 Isosceles Triangle 1830 

7.1.6 Equilateral Triangle 1832 

7.1.7 Right Triangle 1834 

7.1.8 Trapezoid 1835 

7.1.9 Parallelogram 1837 

7.1.10 Hexagon 1839 

7.1.11 Rhombus 1844 

7.1.12 Circle 1845 

7.1.13 Ellipse 1847 

7.2 Known Volumes 1850 

7.2.1 Polyhedron 1851 

7. 2. 1.1 Parallelepiped 1851 

7. 2. 1.1.1 Moment of Inertia of a rectangular plate . . .1853 

7. 2. 1.1. 2 Moment of Inertia of a triangular plate . . .1855 

7.2. 1.2 Pyramid 1857 

7.2. 1 .2.1 Moment of Inertia of a regular square pyramid 858 

7. 2. 1.3 Right Prism 1859 

7. 2. 1.4 Regular Polyhedron 1860 

7. 2. 1.5 Regular Tetrahedron 1865 

7. 2. 1.6 Regular hexahedron (cube) 1868 

1 . 2 . 1.1 Regular octahedron 1868 

7. 2. 1.8 Regular Icosahedron 1872 

7. 2. 1.9 Regular Dodecahedron 1876 

7.2.2 Solids of Revolution 1881 

7.2.2. 1 Cylinder 1883 

7. 2.2.2 Cone 1887 

7.2.23 Sphere 1889 


1.2.2 A Torus 1893 

1. 2.2.5 Ellipsoid (spheroid, geoid) 1897 

1. 2.2.6 Paraboloide 1902 

1 .2.2.1 Wine Barrel with Circular Section 1904 

8 Graph Theory 1908 

8.1 Type of Graphs and Structures 1909 

8.2 Graph Adjacency matrix 1931 

8.3 Categories 1936 

9 Knot Theory 1940 

9.1 Braids Representation 1940 

9.1.1 Braids Group 1942 

9.2 Knot Representation 1945 

9.2.1 Knots Group 1947 

9.3 Tait’s Knot 1950 

9.4 Mathematical Formalisation 1954 

9.4.1 Planar Representation 1962 

8 Mechanics 1964 

1 Principia 1966 

1.1 System of Units 1969 

1.1.1 Dimensional Analysis 1976 

1.1. 1.1 Time 1978 

1.1. 1.2 Length 1979 

1.1. 1.3 Mass 1981 

1.1. 1.4 Energy 1984 

1.1. 1.5 Electric Charge 1986 

1.1.2 Scientific Notation and Metric Prefixes 1989 

1.1.3 Scales of Measurements 1992 

1.2 Distributions 2004 

1.3 Constants 2005 

1.3.1 Mathematical Constants 2005 

1.3.2 Universal Constants (fundamental constants) 2006 

1.3.3 Astronomical/Astrophysical parameters and constants . . . .2008 

1.3.4 Chemical parameters 2008 

1.3.5 Material parameters 2009 

1.3.6 Planck’s constants 2010 

1.4 Principles of Physics 2014 

1.4.1 Principle of Causality 2014 

1 .4.2 Principle of Conservation of Energy 2015 

1.4.3 Principle of Least Action 2017 

1.4.4 Noether’s Principle (Noether’s theorem) 2017 

1.4.4. 1 Invariance by translation in space 2020 

1.4.4. 2 Invariance by rotation in space 2021 

1.4.4. 3 Invariance by translation in space 2022 

1.4.4. 4 Noether’s theorem 2023 

1.4.5 Curie’s Principle 2027 

1 .5 Point Spaces 2028 

2 Analytical/Lagrangian Mechanics 2035 


2.1 Lagrangian formalism 2037 

2.1.1 Generalized coordinates and frames 2038 

2.1.2 Variational Principle 2042 

2.1.3 Euler- Lagrange Equation 2043 

2. 1.3.1 Beltrami Identity 2052 

2. 1 .3.2 Theorem of Variational Calculus 2053 

2.2 Canonical Formalism 2055 

2.2.1 Legendre Transform 2055 

2.2.2 Hamiltonian 2056 

2.2.3 Poisson bracket 2063 

2.2.4 Canonical transformations 2067 

Classical Mechanics 2068 

3.1 Newton’s Laws 2072 

3.1.1 Newton First Law (Inertia Law) 2072 

3.1.2 Newton Second Law (Fundamental Principle of Dynamics) . 2074 

3.1.3 Newton Third Law (Law of Action and Reaction) 2077 

3.2 Center of Mass and Reduced Weight 2078 

3.2.1 Center of Mass Theorem 2081 

3.2.2 Guldin’s Theorem 2086 

3.3 Kinematics of Rectilinear Motion 2089 

3.3.1 Position 2090 

3.3.2 Velocity 2090 

3.3.3 Acceleration 2092 

3.3.3. 1 Osculator Plane 2095 

3.3.4 Galilean Relativity Principle 2097 

3.4 Angular Momentum 2101 

3.4.1 Moments 2106 

3.4.2 Static Forces 2111 

3.5 Ballistics 2115 

3.6 Kinematics of Circular Motion 2120 

3.7 Energy, Work and Power 2128 

3.7.1 Conservative vector field 2129 

3.7.2 Kinetic Energy 2132 

3.7.2. 1 Moment of inertia 2133 

3 . 1.22 Gyroscope 2150 

3.7. 2. 2.1 Classical Approach with precession only . .2152 

3 . 1 . 22.2 Lagrangian Approach with precession and 

nutation 2157 

3 . 12.3 Konig’s kinetic and angular momentum theorems . .2169 

3. 7. 2. 3.1 First Konig’s Theorem (Konig’s angular mo- 
mentum theorem) 2169 

3 . 12.32 Second Konig’s Theorem (Konig’s kinetic 

energy theorem) 2171 

3.7.3 Gravitational Potential Energy 2172 

3.7.3. 1 Gravitational Potential Energy of a Material Sphere 2176 

3.7.4 Conservation of Total Mechanical Energy 2178 

3.7.4. 1 Generalized Newton Law 2180 

3.7.5 Conservation of Linear Momentum 2184 


3.7.5. 1 Elastic Collision in 1-dimensions 2185 

3. 7. 5. 2 Elastic Collision in 2-dimensions 2187 

3. 7. 5. 3 Inelastic Collision in 2-dimensions 2189 

3.7.6 Power 2190 

3.7.6. 1 Power of a turning machine 2191 

3. 7. 6. 1.1 Poweryield 2192 

3.8 Relative Movements and Inertial Forces 2193 

3.8.1 Coriolis force and deflection magnitude 2201 

3.9 Oscillating Movements 2205 

3.9.1 Newton’s cradle 2205 

3.9.2 Simple Pendulum 2207 

3.9.3 Physical Pendulum 2214 

3.9.4 Elastic Pendulum (spring pendulum) 2216 

3.9.4. 1 One degree of freedom elastic pendulum with/with- 
out friction 2217 

3. 9.4. 2 Two degrees of freedom elastic pendulum without 

friction 2221 

3.9.5 Conical Pendulum 2222 

3.9.6 Torsion Pendulum 2224 

3.9.7 Foucault’s Pendulum 2225 

3.9.8 Huygens’ Pendulum (and brachistochrone curve) 2230 

3.9.9 Double Pendulum 2237 

3.9.10 Inverted Pendulum 2242 

3.10 Tribology 2246 

3.10.1 Exponential Friction 2253 

3.10.2 Horizontal Viscous Friction 2255 

3.10.3 Vertical Viscous Friction 2257 

3.10.4 Stokes’ Vertical Viscous Friction 2258 

3.10.5 Stokes’ Horizontal Viscous Friction 2261 

3.10.6 Friction’s Heat Factor 2264 

4 Wave Mechanics 2265 

4.0. 1 Wave Function 2265 

4.0. 2 Wave Equation 2266 

4.1 Type of Waves 2268 

4.1.1 Periodic Waves 2268 

4.1.2 Harmonic Waves 2269 

4. 1.2.1 Phase velocity and Group velocity 2270 

4.1.3 Stationnary Waves 2272 

4.1.4 Vibration Modes in a Stretch String 2275 

4. 1.4.1 Dirichlet Conditions 2277 

4. 1.4. 2 Neumann Conditions 2282 

4.2 Non-relativistic Fagrangian of a String 2284 

4.3 Vibrational modes of a circular membrane 2288 

5 Statistical Mechanics 2298 

5.1 Statistical Information Theory 2298 

5.2 Boltzmann Faw 2305 

5.3 Statistical Physics Distributions 2310 

5.3.1 Maxwell Distribution (velocity distribution) 2310 


5.3.2 Maxwell-Boltzmann Distribution 2316 

5.3.2. 1 Boltzmann Distribution 2322 

5. 3. 2. 2 Fermi-Dirac Distribution 2324 

5. 3. 2. 3 Bose-Einstein Distribution 2328 

5.4 Brownian Motion 2333 

6 Thermodynamics 2343 

6.1 Thermodynamic Variables 2344 

6.2 Thermodynamics Systems 2347 

6.3 Thermodynamic Transformations 2348 

6.4 State Variables 2350 

6.4.1 Phases 2352 

6.5 Equation of State 2354 

6.5.1 Ideal Gaz Law 2354 

6.5.2 State equation of a Liquid 2355 

6.5.3 State equation of Solids 2357 

6.6 Laws of Thermodynamics 2360 

6.7 Calorific Capacities (heat capacity) 2362 

6.8 Internal Energy 2372 

6.8.1 Work (energy) of Mechanical Forces 2376 

6.8.2 Enthalpy 2378 

6.8.3 Laplace’s Law 2381 

6.8.4 Saint- Venant Thermodynamic Equation 2385 

6.8.5 Thermoelastics coefficients 2387 

6.9 Heat 2390 

6.9.1 Entropy 2392 

6. 9. 1.1 Heat Flow 2396 

6.9.2 Carnot Cycle 2398 

6.10 Maxwell relations 2402 

6.11 Continuity Equation 2406 

6.11.1 Heat Equation 2412 Fick’s laws of diffusion 2423 

6.12 Thermal radiation 2426 

6.12.1 Black Body radiation 2426 S tefan-B oltzmann law 2428 Planck’s law 2437 

7 Continuum Mechanics 2450 

7.1 Rigid Bodies 2451 

7.1.1 Pressures 2451 

7.1.2 Elasticity of Solids 2452 

7. 1.2.1 Hooke’s law 2457 

7. 1.2. 2 Shear Modulus 2465 

7. 1.2.3 Compressibility Modulus (bulk modulus) 2473 

7. 1.2.4 Flexural Modulus (bending modulus) 2475 

7. 1.2.5 Tranverse Wave in Solids 2478 

7.2 Liquids 2482 

7.2.1 Pascal’s Fluid Theorem 2484 

7.2.2 Viscosity 2486 

7.2.2. 1 Poiseuille’s Law 2489 


7.2.3 Bernouilli’s Theorem 2491 

7.2.3. 1 Torricelli’s law 2498 

1 . 23.2 Communicating vessels 2502 

1.233 Venturi effect 2504 

7. 2.3.4 Pitot Tube 2507 

1 . 23.5 Pressure drop (pressure loss) 2509 

7.2.4 Navier-Stokes Equations 2511 

7.2.4. 1 Incompressible flow 2528 

7. 2.4. 2 Compressible flow 2536 

7. 2.4.3 Static flow 2536 

7. 2.4. 4 Reynolds number 2537 

7. 2.4. 5 Boussinesq approximation (buoyancy) 2540 

7. 2.4. 6 Stokes’ law 2542 

7.2.5 Hydrostatic Pressure 2548 

7.2.6 Archimedes’ principle 2550 

7.2.7 Speed of sound in a liquid 2552 

7.3 Gas 2553 

7.3.1 Types of Gas 2553 

7. 3. 1.1 Perfect Gas 2553 

7.3. 1.2 Real Gas 2557 

7.3.2 Virial Theorem 2557 

7.3.3 Kinetic pressures (kinetic theory of gases) 2570 

7.3.4 Kinetic Temperature 2572 

7.3.5 Amagat and Dalton’s law 2573 

7.3.6 Mean free path (in kinetic theory) 2575 

7.4 Plasmas 2578 

7.4.1 Plasma Frequency 2580 

9 Electromagnetism 2584 

1 Electrostatics 2586 

1.1 Electric Force 2587 

1.2 Electric Potential 2591 

1.2.1 Path Independance 2594 

1 .3 Equipotential and Field lines 2595 

1.3.1 Infinite straight wire 2597 

1.3.2 Electric Rigid Dipole 2599 

1.4 Electric Field Flow 2610 

1.4.1 Capacitor 2611 

1.4. 1.1 Dielectric strength 2615 

1.5 Electrostatic potential energy 2616 

2 Magnetostatics 2618 

2.1 Ampere’s theorem 2621 

2.1.1 Infinitely long solenoid 2623 

2.1.2 Toroidal coils 2624 

2.1.3 Electromagnet 2626 

2. 1 .3. 1 Strength of a magnet or electromagnet 2627 

2.2 Maxwell-Ampere Relation 2629 

2.3 Biot-Savart law 2630 


2.3.1 Magnetic field for a current loop 2631 

2.3.2 Magnetic field for an infinite wire 2633 

2.4 Magnetic dipole 2635 

2.5 Lorentz law (Lorentz force) 2643 

2.5.0. 1 Magnetic Vector Potential 2647 

2. 5.0. 2 Work of Magnetic Field 2648 

2.5.1 Classical Hall effect 2651 

2.5.2 Larmor radius 2654 

2.5.3 Energy of a magnetic dipole 2660 

2.6 Langevin treatment of Diamagnetism and Paramagnetism 2663 

2.6.1 Langevin model of diamagnetism 2663 

2.6.2 Langevin model of paramagnetism 2666 

3 Electrodynamics 267 1 

3.1 Maxwell Equations 2672 

3.1.1 First Maxwell Equation (constant electric flow) 2672 

3.1.2 Second Maxwell Equation (non-existence of magnetic 

monopomple) 2675 

3.1.3 Third Maxwell Equation 2676 

3. 1.3.1 Betatron 2679 

3.1.4 Fourth Maxwell Equation 2683 

3.1.5 Magnetic Monopoles 2686 

3.2 Charge conservation equation 2688 

3.3 Gauge Theory 2689 

3.3.1 Electromagnetic field tensor 2696 

3.4 Electromagnetic wave equation 2712 

3.4.1 Helmholtz equation 2717 

3.4.2 Energy flow transportation (Poynting vector) 2719 

3.4.3 Emissions 2722 

3.5 Synchrotron radiation (bremsstrahlung) 2725 

3.5.1 Lienard-Wiechert potentials 2730 

3.5.2 Retarded Electric and Magnetic fields 2737 

4 Electrokinetics 2756 

4.1 Kirchoff’s laws 2757 

4.1.1 Mesh law (Kirchhoff’s Loop Law) 2758 

4.1.2 Nodes law (Kirchhoff’s Point Law) 2758 

4.2 Drude model 2759 

4.3 Ohm’s law 2765 

4.3.1 Equivalent Resistance 2768 

4.3.2 Equivalent Capacities 2769 

4.4 Electromotive Force 2771 

4.4.1 Faraday’s law of indicution 2775 

5 Optics (ray optics) 2777 

5.1 Sources and Shadows 2777 

5.2 Colors 2781 

5.3 Radiometry /Photometry 2788 

5.3.1 Energy flow 2788 

5. 3. 1.1 Beer-Lambert law 2788 

5.3.2 Light Intensity (Radiant Intensity) 2790 


5.3.3 Energy Emittance (Radiant Emittance) 2791 

5.3.4 Radiance and Luminance 2792 

5.3.4. 1 Lambert’s Law 2794 

5.3.5 Kirchhoff’s law of Radiation 2795 

5.3.6 Spectral Decomposition 2796 

5.4 Law of Refraction 2797 

5.4.1 Refractive index 2801 

5.4.2 Snell’s law 2804 

5.4.3 Cherenkov radiation 2810 

5.5 Descartes’ Lormulas 2812 

5.5.1 Stigmatism 2816 

5.5.2 Lenses 2821 

5.5.2. 1 Optical Magnification 2833 

5. 5. 2. 2 Human eyes 2841 

5.5.3 Triangular Prism 2841 

5.5.4 Pentaprism 2846 

5.6 Rainbow 2850 

6 Wave Optics 2857 

6.1 Huygens’ principle 2857 

6.2 Fraunhofer Diffraction 2861 

6.2. 1 Case of a rectangular aperture 2861 

6. 2. 1.1 Optical resolution 2868 

6.2.2 Case of a network of rectangular apertures 2870 

6.2.3 Young’s interference experiment 2876 

6.3 Light polarization 2883 

6.3.1 Linear polarization 2888 

6.3.2 Elliptical polarization 2889 

6.3.3 Circular polarization 2890 

6.3.4 N atural polarization 2891 

6.3.5 Malus’ law 2894 

6.4 Coherence and interference 2895 

6.5 LASER 2901 

10 Atomistic 2906 

1 Corpuscular Quantum Physics 2908 

1.1 Dalton’s model 2909 

1.2 Thomson’s model 2910 

1.3 Rhuterfords’s model 2911 

1.4 Bohr’s Model 2913 

1.4.1 Bohr’s Postulates 2913 

1.4.2 Quantification 2914 

1.4.3 Hydrogen Type Atoms Model without dragging 2916 

1.4.4 Hydrogen Type Atoms Model with dragging 2921 

1.4.5 Neutron Assumption 2924 

1.5 Wilson and Sommerfeld’s Model 2926 

1.6 Relativistic Sommerfeld Model 2930 

1.6.1 Magnetic dipole moment 2945 

1.6.2 Spin 2948 


1.6.3 Pauli exclusion principle 2950 

1 .7 Electron configuration (atomic orbital) 295 1 

2 Wave Quantum Physics 2960 

2.1 Postulates 2961 

2.1.1 1st Postulate: Quantum State 2962 

2.1.2 2nd postulate: Time evolution of a quantum state 2964 

2.1.3 3rd postulate: Observables and operators 2965 

2.1.4 4th postulate: Measure of a property) 2967 

2.1.5 5th postulate: Average of a property) 2969 

2.2 Classical principles of uncertainty 2970 

2.2.1 First classical uncertainty relation 2971 

2.2.2 Second classical uncertainty relation 2972 

2.2.3 Third classical uncertainty relation 2973 

2.3 Quantum algebra 2976 

2.3.1 Linear functional operators 2976 

2. 3. 1.1 Hermitian and Self-adjoint operators 2980 

2. 3. 1.2 Commutators and Anticommutators 2983 

2. 3. 1.3 Representatives 2987 

2. 3. 1.4 Eigenvalues and Eigenfunctions 2989 

2. 3. 1.4.1 Orthogonality of eigenfunctions 2990 

2. 3. 1.5 Dirac formalism 2993 

2. 3. 1.5.1 Kets and Bras 2993 

2.4 Schrodinger Model 2996 

2.4.1 de Broglie associated wave 2996 

2.4.2 Classical Schrodinger Wave Equation 2998 

2.4.2. 1 Schrodinger Hamiltonian 3000 

2. 4. 2. 2 De Broglie normalization condition 3004 

2. 4. 2. 3 Bound and unbound states 3006 

2.4.3 Classical Shrodinger equation of evolution 3008 

2.4.3. 1 Operator of evolution 3008 

3 Relativistic Quantum Physics 3066 

3.1 Relativistic Schrodinger evolution equation 3067 

3.1.1 Antimatter 3068 

3.2 Generalized Klein-Gordon Equation 3071 

3.3 Classical free Dirac equation 3077 

3.4 Linearized Dirac Equation 3090 

3.5 Generalized Dirac Equation 3102 

3.6 Pauli Equation 3103 

4 Nuclear Physics 3112 

4. 1 Nuclear Weapon 3112 

4.2 Radioactivity 3115 

4.2.1 Disintegration 3118 

4.2. 1.1 Half-life isotope 3119 

4.2.2 Activity 3120 

4.2.2. 1 Carbon- 14 dating (radiocarbon dating) 3122 

4. 2. 2. 2 Radioactive Decay chain 3126 

4.2.3 Two level radioactive cascade 3127 

4.2.3. 1 Secular equilibrium 3129 


4. 2. 3. 2 Transient equilibrium 3130 

4. 2. 3. 3 Nonequilibrium 3130 

4.2.4 Radioactive phenomena 3131 

4.2.4. 1 Nuclear Fusion (1) 3136 

4. 2.4. 2 Nuclear Fission (2) 3137 

4. 2.4. 3 Alpha Disintegration (3) 3139 

4. 2.4. 4 Beta- Disintegration (4) 3148 

4. 2.4. 5 Beta-r Disintegration (5) 3149 

4. 2.4. 6 Electronic capture (6) 3150 

4. 2.4. 7 Gamma emission (7) 3151 

4. 2.4. 8 Internal conversion (8) 3152 

4.3 Radiation protection 3154 

5 Quantum Field Theory 3176 

5 . 1 Yukawa potential 3179 

5.1.1 Mass fields 3181 

5.1.2 Non-mass fields 3182 

5.2 Euler-Lagrange equation for Fields 3184 

5.3 Gauge Theories 3195 

5.3.1 Global Gauge invariance 3196 

5.3.2 Local Gauge invariance 3198 

6 Elementary Particle Physics 3203 

6.1 Coupling Constants 3205 

6.2 Spin magnetic resonance 3210 

11 Cosmology 3216 

1 Astronomy (Celestial Mechanics) 3218 

1.1 Drake Equation 3218 

1.2 Kepler’s Laws 3219 

1.2.1 First Kepler’s Law (conicity law) 3219 

1.2.2 Second Kepler’s Law (area law) 3220 

1.2.2. 1 Time of flight 3222 

1.2.3 Third Kepler’s Law (periods’ law) 3226 

1.3 Newton Gravitational Law 3229 

1 .3. 1 Gaussian Formulation of Newtonian Gravity 3236 

1.3.2 Shell Theorem 3238 

1.3.3 Orbital speed 3240 

1.3.4 Asteroids/Meteors impact velocity 3241 

1.3.5 Spherisation of Celestial Bodies 3241 

1.3.5. 1 Flattening of Celestial Bodies (rotational flattening) 3243 

1.3.6 Stability of Atmospheres 3245 

1.4 Roche’s Limit 3246 

1.5 Keplerian Orbitals 3248 

1 .5. 1 First Binet Formula 3248 

1.5.2 Second Binet Formula 3253 

1.5.3 Keplerian orbital period 3257 

1 .5.4 Classical deflection of light 3258 

1 .5.5 Classical precession of perihelia 3260 

1.6 Duration of the diurnal arc 3270 


1.6.1 Trigonometric parallax 3277 

1.7 Planets’ Motion 3279 

1.7.1 Synodic and Sidereal period 3279 

1.7.2 Planet’s apparent retrograde motion 3282 

1.8 Lagrange Points 3288 

1.8.1 Equilibrium points of the first type 3297 

1 . 8 . 1 . 1 LI Lagrange point 3297 

1.8. 1.2 L2 Lagrange point 3301 

1.8. 1.3 L3 Lagrange point 3304 

1.8.2 Equilibrium points of the second type 3306 

1.8.2. 1 L4, L5 Lagrange points 3306 

1.9 Relativistic Doppler-Lizeau Effect 3312 

1.9.1 Apparent speed 3316 

2 Astrophysics 3320 

2.1 Stars 3320 

2.1.1 Stellar Physics 3327 

2. 1 . 1 . 1 Collapse of an Interstellar Cloud 3327 

2. 1.1. 1.1 Limit Mass Cloud for Ionization (rogue 

planets) 3329 

2. 1 . 1 . 1 .2 Limit Mass Cloud for Lusion (black dwarf) . 3330 

2. 1 . 1 .2 Nuclear Duration Life 3332 

2. 1.1.3 Internal Temperature 3334 

2. 1.1. 4 External temperature 3336 

2.1. 1.5 Equation of Hydrostatic Equilibrium 3338 

2.1. 1.6 Brightness 3341 

2. 1 . 1 .7 Shining (apparent brightness) 3341 

2. 1 . 1 .8 Apparent magnitude 3343 

2. 1.1. 9 Absolute magnitude 3345 

2.1.2 Pulsative Variable Stars 3348 

2.1.3 Neutron Stars (magnetars) 3351 

2 . 1 . 3 . 1 Chandrasekhar limit 3352 

2. 1.3.2 Neutron star magnetic field 3353 

2.2 Galaxies 3354 

2.2.1 Radial Speed Anamoly 3355 

3 Special Relativity 3358 

3.1 Assumptions and Principles 3359 

3.1.1 Postulate of Invariance 3359 

3.1.2 Cosmological Principle 3359 

3.1.3 Special Relativity Principle 3360 

3.2 Lorentz Transformations/Boost 3362 

3.2.1 Displacement four- vector 3366 

3.2. 1.1 Wave Equation Invariance 3369 

3. 2. 1.2 Hypergeometric interpretation 3370 

3.2.2 Velocity four- vector 3372 

3.2.3 Current four-vector 3376 

3.2.4 Acceleration four- vector 3376 

3.2.5 Relativistic sum of velocities 3381 

3.2.6 Relativistic lengths variation (length contraction) 3382 


3.2.7 Relativistic time variation (time dilatation) 3384 

3.2.7. 1 Hafele-Keating experiment 3386 

3. 2.7. 2 Twins paradox 3390 

3.2.8 Apparent relativistic mass 3392 

3.2.8. 1 Mass-energy equivalence 3397 

3. 2. 8. 2 Relativistic Lagrangian 3398 

3. 2. 8. 3 Relativistic (linear) momentum 3400 

3. 2. 8. 3.1 Einstein relation 3404 

3.2. 8.3.2 Time of flight 3405 

3. 2. 8. 3. 3 Relativistic force 3407 

3.2.9 Relativistic electrodynamics 3410 

3.2.9. 1 Tensor field transformation 3418 

3.3 Minkowski space-time 3419 

3.3.1 Four-vectors 3421 

3.3.2 Universe light cone 3423 

4 General Relativity 3430 

4.1 Assumptions and Principles 3430 

4.1.1 Equivalence Postulates 3430 

4.1.2 Mach Principle 3434 

4.2 Metrics 3435 

4.2.1 Schild Criteria (Einstein red-shift effect Newtonian approach) 3440 

4.3 Equations of movement 3443 

4.3.1 Geodesic equations 3452 

4.3.2 Newtonian Limit 3456 

4.4 Stress-Energy Tensor 3459 

4.5 Einstein’s Field Equations 3465 

4.5.1 Cosmological Constant 3471 

4.5.2 Schwarzschild Solution 3473 

4.6 Experimental Tests 3486 

4.6.1 The Precession of Mercury’s Perihelion 3486 

4.6.2 Deflexion of Light 3497 

4.6.3 Shapiro Effect (delay) 3502 

4.6.4 Black Holes 3509 

5 Cosmology 3514 

5.1 Newtonian Cosmological Model 3514 

5.1.1 Hubble’s Law 3516 

5.2 Friedmann Equations 3519 

5.2.0. 1 Critical Density 3523 

5.3 Cosmological models of Friedmann-Lemaitre 3528 

5.3.1 Flat spaces (k = 0) 3528 

5.3. 1.1 Flat space dominated by matter 3529 

5. 3. 1.2 Flat space dominated by radiation 3531 

5.3.2 Spherical spaces (k > 0) 3533 

5.3.2. 1 Spherical space dominated by matter 3539 

5. 3. 2. 2 Spherical space dominated by radiation 3541 

5.3.3 Hyperbolic spaces (kyO) 3544 

5.3.4 Matter dominated hyperbolic space 3547 

5.3.5 Hyperbolic space dominated by radiation 3550 


5.4 Observable Universe 3553 

5.5 Cosmic Microwave Background (CMB) 3568 

6 String Theory 3573 

6.1 Wave equation of a transervsal string 3574 

6.2 Non-relativistic Wave equation of a transversal string 3576 

6.2.1 Nambu-Goto Action 3582 

6.3 Lagrangian of a String 3589 

12 Chemistry 3593 

1 Quantum Chemistry 3594 

1.1 Infinite three-dimensional rectangular potential 3594 

1.2 Molecular Vibrations 3597 

1 .3 Hydrogenoid Atom 3600 

1 .4 Rigid Rotator 3604 

1.4.1 Potential Profile 3631 

2 Molecular Chemistry 3634 

2.0. 1 Orbital Approximations 3635 

2.1 LCAO Method 3639 

2.2 Molecular Rotational Energy Levels 3648 

2.3 Molecular Vibrational Energy Levels 3650 

3 Analytical Chemistry 3652 

3.1 Simple Mixtures 3653 

3.2 Reactions 3655 

4 Thermochemistry 3659 

4.0. 1 Chemical transformations 3659 

4.1 Molar Quantities 3661 

4.1.1 Standard enthalpy of reaction 3668 

4. 1 . 1 . 1 Kirchhoff ’s Enthalpy Law 3671 

13 Theoretical Computing 3675 

1 Numerical Methods/ Analysis 3677 

1 . 1 Computer Representation of Numbers 3679 

1.1.1 Decimal System 3679 

1.1.2 Binary system 3680 

1.1. 2.1 Binary arithemetics 3681 

1.1.3 Hexadecimal System 3681 

1.1.4 Octal System 3682 

1.1.5 Conversion of decimal system to non-decimal system: . . . .3682 

1.2 Algorithm Complexity 3685 

1.2.1 NP-Completude 3690 

1.3 Integer Part 3692 

1 .4 Heron’s Square Root Algorithm 3694 

1.5 Archimedes Algorithm 3696 

1.6 Euler’s Number e 3698 

1.7 Stirling’s factorial approximation 3698 

1 .8 Linear System of Equations 3701 

1.8.1 One equation with on unknown 3701 

1.8.2 Two equations with two unknowns 3701 

1.8.3 Three equations with three unknowns 3703 


1.8.4 n equations with n unknowns 3706 

1.9 Polynomials 3707 

1.10 Regression Techniques 3710 

1.10.1 Univariate linear regression model 3713 Regression line 3714 Least Squares Method (LSM) 3715 

1 . 10. 1 .3 Univariate Regression Variance Analysis 3718 

1 . 10. 1 .4 F-test for Regression (significance test for linear re- 
gression) 3724 

1 . 10.2 Univariate linear regression Gaussian Model 3732 Pearson Correlation Coefficient Test 3740 Confidence interval of predicted values 3742 

1.10.3 Linear univariate regression forced through the origin . . . .3749 

1.10.4 Deming regression (orthogonal regression) 3750 

1. 10.5 Multiple linear regression Gaussian Model 3756 Variance Inflation Factor (multicolinearity) 3764 

1.10.6 Polynomial regression 3769 

1.10.7 Logistic Regressions (LOGIT) 3771 Binomial Logistic Regression 3771 ROC and Lift curves 3780 

1.11 Interpolation Techniques 3795 

1.11.1 Bezier Curves (B-Splines) 3795 

1.11.2 Euler Method 3802 

1.11.3 Polynomial of collocation 3804 

1.11.4 Lagrange polynomial interpolation method 3 807 

1.12 Roots search 3809 

1.12.1 Proportional parts methods 3809 

1.12.2 Bisection method 3811 

1.12.3 Secant method (Regula Falsi or False Position) 3814 

1.12.4 Newton’s method 3817 

1.13 Numerical Differentiation 3822 

1.14 Numerical Integration 3824 

1.14.1 Rectangles method 3825 

1.14.2 Trapezoidal method 3826 

1.15 Optimization 3827 

1.15.1 Linear programming (Linear Optimization) 3828 Graphical LP resolution 3831 Algebraic LP resolution 3832 Simplex algorithm LP resolution 3838 

1.15.2 N onlinear programming (N onlinear optimization) 3844 Substitution Method 3847 Lagrange Multipliers Method 3848 Newton-Raphson Method (Quadratic Newton) . . .3852 Gauss-Newton Method (Tangent Newton) 3857 

1.16 Resampling statistics 3865 

1.16.1 Monte Carlo Simulations 3866 Inverse Transform Sampling 3868 Random number generation 3870 

xxxiii Monte Carlo integration 3874 Monte Carlo Estimation of Pi 3875 Monte Carlo Modeling 3877 

1.16.2 Bootstrapping 3881 

1.16.3 Jackknifing (jacknife resampling) 3887 

1.17 Finite difference method (F.D.M.) 3890 

1.17.1 One space dimension F.D.M 3890 von Neuman stability 3893 

1.17.2 Space-time F.D.M (finite-volume method) 3896 

1.18 Data Mining 3904 

1.18.1 Clustering 3915 

1.18.2 Regression and classification trees 3918 

1.18.3 K - Means 3927 

1.18.4 Hierarchical Ascendant Classification (HAC) Dendrograms .3934 

1.18.5 Neural networks 3940 Neuron model 3942 Transfer functions 3946 Network Architecture 3947 

1.18.6 Genetic Algorithms 3959 Encoding and Initial population 3963 Operators 3965 Operator of selection 3966 Crossover operator 3967 Mutation operator 3968 

2 Fractals 3974 

2.1 IFS Fractals 3976 

2.1.1 Fractals Metric Space 3991 

2.2 Fractals Visualization 3995 

2.2.1 Cantor’s Fractal (Cantor Set) 3995 

2.2.2 Triangle Sierpinski Fractal 3997 

2.2.3 Sierpinski carpet fractal 4003 

2.2.4 Fractal spirals 4007 

2.2.5 Von Koch fractal (Koch snowflake) 4009 

2.2.6 Natural fractals 4013 

2.2.6. 1 Branch 4014 

2.2.62 Snowflake 4016 

2.2.63 Tree 4019 

2. 2. 6.4 Fern 4021 

2.3 Escape Time Algorithm Fractals 4024 

2.3.1 Mandelbrot set 4026 

2.3.2 Julia set 4030 

2.3.3 Newton set 4034 

3 Fogical Systems 4036 

3.1 Strict Fogic 4036 

3.1.1 Boolean Algebra 4037 

3.1.2 Fogical Functions 4044 

3.1.3 Karnaugh maps 4047 

4 Error-Correcting Codes 4048 


4.1 Checksum 4052 

4.1.1 Luhn algorithm 4052 

4.2 Check Digit 4054 

4.2.1 European Article Numbering (EAN- 13) 4054 

4.2.2 Swiss Post Payment slip 4056 

4.2.3 International Bank Account Number (IBAN) 4058 

4.2.4 UIC wagon numbers 4059 

4.3 Permutations 4061 

4.4 Encoders 4062 

4.4.1 Block code 4066 

4.4.2 Systematic codes 4073 

5 Automata Theory 4076 

5.1 Von Neumann machine 4078 

5.2 Turing machine 4079 

5.3 Chomsky hierarchy 4083 

5.3.0. 1 Formal language 4083 

6 Cryptography 4085 

6.1 Cryptographic systems 4086 

6.1.1 Kerckhoffs’ principle 4090 

6.2 Traps 4091 

6.3 Secret-key encryption system 4092 

6.3.1 Feistel Schemes 4094 

6.4 Public key encryption 4099 

6.4.1 Diffie-Hellman protocol 4101 

6.4.2 Elliptic Curves Cryptography 4102 

6.4.2. 1 Plane curves 4102 

6. 4. 2. 2 Plane curves of low degree 4103 

6. 4. 2. 3 Rational points on the unit circle 4104 

6. 4. 2. 4 Elliptic curves and Bezout’s theorem 4106 

6. 4. 2. 5 The addition law on elliptic curves 4107 

6. 4. 2. 6 Generating all the rational points 4108 

6. 4. 2. 7 Beyond elliptic curves 4109 

7 Quantum Computing 4110 

7.1 Schrodinger’s Cat superposition 4112 

7.2 Photon polarization 4113 

7.3 Qubit 4116 

7.3.1 Bloch sphere 4122 

7. 3. 1.1 Qubit of polarization 4127 

7. 3. 1.2 1/2 spin Qubit 4129 

7.4 Quantum logic gates 4134 

14 Social Sciences 4137 

1 Population Dynamics 4139 

1.1 Birth rate and mortality tables (biometric features) 4139 

1.1.1 Population Renevewal 4148 

1.2 Population Models 4150 

1.2.1 Exponential model 4150 

1.2.2 Deterministic Logistic Model (Verlhust) 4153 


1.2.3 Chaotic Logistic Model 4157 

1.2. 3.1 Feigenbaum’s Bifurcation Diagram 4161 

1.2.4 Malthusian Growth Law 4165 

1.2.5 Leslie model 4166 

1.2.6 SIR Model for Spread of Disease 4168 

1.2.7 Lotka- Volterra predator-prey model 4171 

1.3 Schaefer’s Optimal capture model 4180 

1.4 Hardy-Weinberg model 4182 

1.5 Mendel’s law 4188 

1.6 Growth rate with temperature 4189 

2 Game and Decision Theory 4190 

2.1 Behavorial decision bias (cognitive bias) 4194 

2.1.1 Sunk Cost 4196 

2.1.2 Anchoring Bias 4197 

2.2 Utility 4199 

2.2.1 Pareto Optimum 4199 

2.2.2 Nash Equilibrium 4200 

2.3 Games Representations 4203 

2.3.1 Extensive representation of a Game 4205 

2.3.2 Extensive representation of a Decision 4206 

2.3.2. 1 Real Options 4213 

2.3.3 Normal representation of a Game 4215 

2.3.3. 1 Repetitive Games 4222 

2.3.4 Set representation of a Game 4224 

2.3.5 Graphical representation of a Game 4227 

2.4 Expected Utility 4231 

2.4.1 Hurwitz Criteria 4231 

2.4.2 Laplace Criteria 4234 

2.5 Evolutionary Game 4235 

2.5.1 Dove % Hawk game in pure strategy (without probabilities) . 4238 

2.5.2 Dove % Hawk game in stable evolutionary strategy (with 

probabilities) 4240 

2.6 Cournot Competition 4242 

2.7 Markov Decision Processes (MDPs) 4246 

2.8 Multi-Criteria Decision Making (MCDM) 4252 

2.8.1 Analytic Hierarchy Process (AHP) 4252 

3 Economy 4263 

3.1 Concepts 4263 

3.1.1 Microeconomics 4264 

3 . 1 . 1 . 1 Average & Marginal Cost/Revenue 4271 

3.1.2 Macroeconomics 4276 

3 . 1 .2. 1 Cobb-Douglas Model 4277 

3.2 Monetary Model 4283 

3.2.1 Walras’ law 4285 

3.3 Price Index and GDP 4290 

3.3.1 Paasche and Laspeyres price indices 4291 

3.3.2 Fisher index and Marshall-Edgeworth index 4292 

3.3.3 Gross domestic product (GDP) 4293 


3.4 Supply and Demand Theory 4295 

3.4.1 Expected utility theory 4295 

3.5 Net gain/loss opposite feedback model 4301 

3.6 Capitalization and Actuarial 4309 

3.6.1 Dates Intervals 4311 

3.6.2 Rates Equivalence 4314 

3.6.3 Simple Interest 4315 

3.6.3. 1 Discounts 4316 

3.6.4 Compound Interest 4318 

3.6.5 Continuous Interest 4320 

3.7 Progressive interest (annuities) 4321 

3.7.1 Postnumerando annuities 4322 

3.7.2 Praenumerando annuities 4325 

3.8 Rounding 4327 

3.9 Loans Amortization/Repayments 4330 

3.9.1 Fixed-Term Loan 4332 

3.9.2 Loan with constant amortization 4333 

3.9.3 Loan with constant annuity 4333 

3.10 Modern Portfolio Theory 4336 

3.10.1 No Arbitrage Opportunity (N . A . O . ) 4339 

3.10.2 Portfolios 4344 Stocks (shares of stocks )/Equities 4352 Dividend Yield 4354 Shares Benchmark Indices 4357 Durand Model 4358 Obligations (Bonds) 4362 Warrants 4375 Futures & Forwards 4379 Futures & Forwards naive pricing 4384 Futures & Forwards commodity hedging . .4388 Options 4394 Returns and Investments rates 4406 Return on Investment 4407 Internal Rate of Return 4408 Money Weighted Rate of Return (M.W.R.R.) 4409 Time Weighted Rate of Return (M.W.R.R.) .4412 Theory of Speculation 4414 Portfolio efficient diversification models 4423 Overall minimum variance portfolio 

(Markowitz portfolio) 4424 Overall minimum Sharpe portfolio 4435 Capital Asset Pricing Model (CAPM) 4453 Black & Scholes option pricing models 4468 Put-Call parity equation 4469 Efficient Market Hypothesis 4472 Wiener Process 4473 Generalized Brownian motion 4477 Brownian bridge 4481 

xxxvii Ito process 4484 Black & Scholes Equation 4497 Self-financing portfolio on underlying . . . .4500 Greeks and others 4501 Solving the Black & Scholes equation . . . .4505 Binomial Pricing (CRR model) 4517 VIX volatility index 4531 Value at Risk 4538 Relative Value at Risk 4539 Absolute Value at Risk 4544 Delta-Normal Value at Risk 4546 Historical Value at Risk 4547 Credit Value at Risk 4547 Operational Value at Risk 4549 Variance-Covariance Value at Risk 4551 Variance-Covariance Value at Risk 4552 Back-Testing Value at Risk 4554 

3.11 Time Series Analysis 4556 

3.11.1 Type of Errors 4561 

3.11.2 Decompositions 4562 

3.11.3 Types of Forecasting Models 4568 S imple Moving Average (moving average smoothing)!5 7 1 

3.11 .3.2 Linear Model With Seasonal Coefficients (LMSC) . 4574 

3.11 .3.3 Simple Exponential Smoothing (EWMA) 4579 Double Exponential Smoothing with One Parameter 

(Brown Method) 4588 Holt’s Double Exponential Smoothing with 2 Pa- 
rameters (Additive Method) 4596 Holt’s and Winter Triple Exponential Smoothing 

with 3 Parameters (Multiplicative Method) 4601 Logistic Model 4606 

3.1 1.4 Autoregressive Models 4613 

3. 1 1 .4. 1 AR(p) Autoregressive processes 4622 

3.11 .4.2 MA(g) Moving Average stationary process 4624 ARMA(p, q) Autoregressive non-seasonal moving 

average processes 4626 ARIMA(p, d, q) Autoregressive non-seasonal inte- 
grated moving average processes 4627 

3.11.5 Durbin- Watson autocorrelation test 463 1 

Quantitative Management 4634 

4.1 Corporate Finance Management 4637 

4.1.1 Basic Accounting Equation 4638 

4.1.2 Ratio Analysis 4639 

4. 1 .2. 1 Short term solvency or liquidity measure 4640 

4. 1.2. 2 Long term solvency or Liquidity measure 4641 

4. 1.2. 3 Profability Measures 4642 

4. 1.2.4 Growth Rate 4643 

4. 1 .2.5 Asset management or turnovers measures 4644 


4. 1.2. 6 Market Value Measures 4644 

4.1.3 Weighted Average Cost of Capital (WACC) 4646 

4.1.4 Break-even Point Analysis (BEPA) 4649 

4.1.5 Investment Strategies 4651 

4. 1 .5. 1 Net Present Value 4652 

4. 1 .5.2 Internal Rate of Return 4657 

4. 1 .5.3 Internal Rate of Return 4659 

4.1.6 Company Valuation Methods 4659 

4. 1.6.1 Balance sheets-based method 4661 

4.1. 6.1.1 Book Value 4661 

4. 1.6. 1.2 Adjusted Book Value 4662 

4. 1.6. 1.3 Liquidation Value 4663 

4. 1.6. 1.4 Substantial Value 4663 

4. 1.6. 1.5 Book Value and Market Value 4664 

4. 1.6. 2 Income Statemented-Based Methods 4664 

4. 1 .6.2. 1 Value of Earnings with PER 4665 

4. 1.6. 2. 2 Value of Dividends 4666 

4. 1.6. 2. 3 Sales Multiples 4667 

4. 1.6. 3 Goodwill-Based Methods 4667 

4.1.7 Capital Goods 4669 

4. 1 .7. 1 Linear Amortization 4669 

4. 1.7. 2 Arithmetic Declining Amortization 4670 

4. 1 .7.3 Geometric Declining Amortization (declining balanco)67 1 

4.1.8 Wages model 4672 

4.2 Project Management 4674 

4.2.1 Probabilistic PERT 4674 

4.2.2 Project planning variance reduction 4683 

4.2.3 Process Reliability 4685 

4.3 Lean Management (Six Sigma Process) 4687 

4.3.1 Pareto Analysis 4690 

4. 3. 1.1 Gini Index 4694 

4.3.2 Weighted Ishikawa Diagram 4697 

4.4 Supply Chain Management 4701 

4.4.1 Supply Chain Management in uncertain future 4703 

4.4.2 Optimal initial stock management with zero rotation 4705 

4.4.3 Wilson’s Models 4712 

4.4.3. 1 Wilson’s model with resupply 4717 

4. 4. 3. 2 Wilson’s model without resupply 4726 

4. 4. 3. 3 Wilson’s model with resupply and break-up 4728 

4.5 Queueing Theory 4733 

4.5.1 M/M/ . . . arrival times modelisation 4737 

4.5.2 M/M/ . . . service times modelisation 4742 

4.5.3 Kendall queues notation 4744 

4.5.4 Modeling of arrivals and departures M/M/1 4746 

4.5.5 Probability of standby in &M /M /k/k queue (Erlang-B formula)753 

4.5.6 Probability M/M/ K / + oo of standby (Erlang-C formula) . . 4756 

4.6 Insurance 4761 

4.6.1 Premium pricing 4763 


4.7 Sensitivity Analysis 4770 

4.7.1 Direct Bias Method 4772 

4.7.2 Correlation Method 4776 

5 Music Maths (physics of hearing) 4781 

5.1 Longitudinal Sound Waves 4781 

5.1.1 Power carried by a sound wave 4788 

5.1.2 Measuring the intensity of a sound 4792 

5.2 Spherical Sound Waves 4794 

5.3 Doppler effect 4795 

5.3.1 Fixed source-Moving observer 4796 

5.3.2 Moving source-Fixed observer 4797 

5.3.3 Moving source and observer 4799 

5.4 Shockwaves 4800 

5.5 Music Scales 4802 

5.6 Harmonic Oscillator 4805 

5.6.1 Damped oscillator 4807 

15 Engineering 4809 

1 Marine & Weather Engineering 4811 

1 . 1 Visual horizon 4813 

1.2 Wind direction 4817 

1.3 Atmospheric Profile Models 4820 

1.3.1 Atmospheric Exponential Profile Model 4820 

1.3.2 Adiabatic Atmosphere Model 4824 

1.3. 2.1 Hypsometric equation 4826 

1 .4 Planetary equilibrium temperature 4828 

1 .4. 1 Greenhouse effect 4830 

1.4.2 Milankovitch cycles 4831 

1.5 Weather (sounding) balloon 4832 

1.6 Cyclogenesis and Anticyclogenesis 4836 

1.7 Tides 4845 

1.7.1 First approach 4846 

1.7.2 Second approach 4849 

1.8 Lorenz equation 4856 

1.8.1 Rayleigh-Benard convection cells (Benard-Marangoni insta- 
bility) 4866 

1.8.2 Lorenz attractor and chaos 4874 

1.9 Waves 4881 

1 .9. 1 Depth of a wave 4892 

1.9.2 Wave’s amplitude 4893 

2 Mechanical Engineering 4895 

2.1 Gears 4895 

2.1.1 Transmission ratios 4899 

2.1.2 Gears association 4902 

2. 1.2.1 Odd/Even Gear "problem" 4906 

2.1.3 Type of Gears 4908 

2.2 Strength of materials 4913 

2.2.1 Quadratic moments 4916 


2.2.2 Equation of the elastic line 4918 

2.2.2. 1 Euler-Bernoulli Beam equation 4925 

2. 2. 2. 2 Potential elastic energy 4931 

2.2.3 Torsion 4933 

2.2.4 Buckling 4937 

2.2.5 Traction 4942 

3 Electrical Engineering 4943 

3.1 Elementary Primitive Electrical Symbols 4944 

3.2 Alternative current VS Direct current 4951 

3.2.1 Average power 4953 

3.3 Transformers 4957 

3.3.1 Transformer universal EMF equation 4963 

3.4 Steady State linear circuits 4964 

3.4.1 RC series circuit 4964 

3.4.2 RL series circuit 4967 

3.4.3 RLC circuit 4970 

3.4.3. 1 Critically damped response 4972 

3. 4. 3. 2 Overdamped response (hypercritic) 4973 

3. 4. 3. 3 Underdamped response (decaying oscillation) . . .4975 

3.5 Linear circuit in forced regime 4980 

3.5.1 Low-pass filter 4982 

3.5.2 High-pass filter 4984 

3.5.3 Integrator and differentiator 4985 

4 Civil Engineering 4987 

4.1 Static 4988 

4.2 Pulleys 4989 

4.2.1 Windlass 4995 

4.3 Cornu spiral 4997 

4.4 Overhead cable 5002 

4.4.1 Free overhead cable (catenary) 5003 

4.4.2 Charges overhead cable (suspended bridge) 5010 

4.4.3 Very tense cable 5013 

4.5 Falling chimney (naive approach) 5015 

4.6 Dams 5021 

5 Aerospace Engineering 5024 

5.1 Airfoil Lift 5025 

5.1.1 Newton’s lift argument (skipping stone argument) 5027 

5.1.2 Bernoulli’s lift argument (equal time argument) 5028 

5.1.3 Euler’s lift argument 5029 

5.1.4 Coanda lift argument 5030 

5.1.5 Kutta-Joukowski lift argument 5030 

5.2 Cosmic speeds 5031 

5.3 Fundamental Equation of Propulsion (Tsiolkovsky rocket equation) . . 5033 

5.4 Geostationary orbit 5040 

5.5 Vis- Viva Equation 5044 

5.6 Hohmann Transfer orbit 5046 

6 Software Engineering 5048 

6.1 Algorithm 5048 


6.2 Dichotomic Search algorithm 5049 

6.2.1 Bisection algorithm 5049 

6.2.2 Binary search algorithm 5052 

6.3 Tower of Hanoi algorithm 5054 

6.4 Sorting Algorithms 5060 

6.4.1 Bubble sort 5060 

6.4.2 Quicksort algorithm 5062 

6.5 Dijkstra’s algorithm 5064 

6.6 Google PageRank algorithm 5070 

6.6.1 Weighted Count 5072 

6.6.2 Recursive counting 5073 

6.6.3 Absorbing states 5078 

7 Industrial Engineering 5081 

7.1 Six Sigma 5082 

7.1.1 Quality Control 5084 

7.1.2 Defaults/Errors 5085 

7.1.3 Capability Indices 5090 

7.1.4 Quality Levels 5103 

7.2 Taguchi Model 5116 

7.3 Preventive Maintenance 5122 

7.3.1 Planned Obsolescence 5123 

7.3.2 Reliability Empirical Estimators 5125 

7.3.2. 1 Average Failure Rate 5140 

7.3.3 Weibull Distribution 5141 

7.3.3. 1 Two-parameter Weibull distribution linearization . .5146 

7.3.4 Topology of Systems 5148 

7.3.4. 1 Fault Tree Analysis 5160 

7. 3.4. 2 Markov Chain Reliability Model 5162 

7.3.5 Maximum Likelihood for failure rate determination of samples5166 

7.3.6 Kaplan-Meier S urvi val Rate 5168 

7.3.7 ABC Method 5173 

7.4 Design of Experiments (DoE) 5178 

7.4. 1 Two levels factorial Designs 5188 

7.4. 1 . 1 Replicated full factorial designs 5195 

7. 4. 1.2 Plackett-Burman Designs 5201 

7. 4. 1.3 Fractional Factorial Designs 5206 

7.4.2 General factorial Designs 5216 

7.4.3 Taguchi Designs and Nomenclature (robust designs) 5230 

7.4.4 Response Surface Methodology (Box Domains) 5239 

7.4.4. 1 Pure quadratic curvature test 5241 

7. 4.4. 2 Box- Wilson Central Composite Designs 5243 

7.4. 4. 2.1 Circumscribed Center Designs 5246 

7.4. 4. 2. 2 Face Centered Designs 5250 

7.4.5 Optimal Designs 5254 

7.4.6 Mixture Design 5263 

7.4.6. 1 Network mixture designs (simplex lattice designs) . 5266 

7. 4. 6. 2 Full Factorial Combined with Mixture Design- 

Crossed Design 5282 


7.4.7 General DoE diagnostic tools 5285 

7.4.7. 1 Lenth’s PSE Pareto Margin Error for unreplicated 

factorial designs 5285 

7. 4.7. 2 Pareto Margin Error for replicated factorial designs . 5288 

7. 4. 7. 3 Desirability 5289 

7.5 Quality Control on Reception (Lot Acceptance Sampling Plans) .... 5293 

7.5.1 Simple acceptance sampling plan by measurement for a 

unique tolerance with known standard deviation 5296 

7. 5. 1.1 Calculation of the parameters using the norms AF- 

X06-023 5304 

7.5.2 Simple acceptance sampling plan by attribute 5307 

7.5.2. 1 Calculation of the parameters using the norm ISO 

2859-1 5313 

7.5.3 Double acceptance sampling plan by attribute 5315 

7.5.4 Operating characteristic curve (OC) 5316 

7.5.5 Average outgoing quality (AOQ) 5319 

7.6 Quality Control Charts (CC) 5323 

7.6.1 WECO’s empirical rules 5327 

7.6.2 Sample size and Sampling frequency for Control Charts . . . 5329 

7.6.3 Attributes Control Charts (qualitative CC) 5331 

7.6.3. 1 P Control Charts (binomial proportion CC) 5333 

1.63.2 NP Control Charts (binomial counting CC) . . . .5337 

1.633 C Control Charts (Poisson counting CC) 5340 

7. 6. 3. 4 U Control Charts (normalized Poisson) 5344 

1.63.5 Laney’s p' and u' control charts 5348 

7.6.4 Measurement Control Charts (quantitative CC) 5350 

7.6.4. 1 Individual measurement control chart with required 

limits 5351 

7. 6.4. 2 Individual measurement control chart with moving 

limits 5352 

7. 6.4. 3 Subgroups measurement control chart with standard 

error 5358 

7. 6.4. 4 S — S Subgroups measurement control chart for 

standard deviation 5363 

7. 6.4. 5 X — S Subgroups measurement control chart . . . .5371 

7. 6.4. 6 X — Sp Subgroups measurement control chart with 

pooled variance 5375 

7. 6.4. 7 R — R Subgroups measurement control chart .... 5380 

7.6.5 Autocorrelated Measurement Control Charts (time weighted 

control charts) 5390 

7.6.5. 1 I — MR/X Individual moving range measurement 

control chart 5390 

7. 6. 5. 2 I — MR/MR Individual moving range measurement 

control chart 5394 

1.6.53 Individual Moving Average control chart 5397 

7. 6. 5. 4 CUSUM (cumulated sum) control chart with empir- 
ical V-mask 5402 


7. 6. 5. 5 EWMA control charts (exponential weighted mov- 
ing average) with fixed limits 5410 

7.6.6 Rare events control charts 5417 

7.6.6. 1 Frequency T control chart with probabilistic limits .5418 

7. 6. 6. 2 Frequency G control chart of rare events 5422 

7.6.7 Control Charts Operating Characteristic (OC) Curves 5427 

7.6.7. 1 OC for X measurement control charts 5427 

7. 6. 7. 2 OC for P-type attribute control charts 5429 

7.7 Design of reliability tests 5431 

7.7.1 Chi- squared time of test 5432 

7.7.2 Binomial sampling size 5434 

7.7.3 Beta-binomial sampling size 5435 

16 Epilogue 5439 

17 Biographies 5441 

A 5442 

B 5444 

C 5451 

D 5459 

E 5463 

F 5466 

G 5470 

H 5475 

I 5483 

J 5483 

K 5486 

F 5489 

M 5497 

N 5504 

0 5508 

P 5509 

R 5514 

S 5516 

T 5523 

V 5526 

W 5528 

Y 5531 

Z 5533 

18 Chronology 5534 

19 Humour 5565 

1 Situations 5566 

2 Mathematics 5579 

3 Physics 5597 

4 Statistics 5613 

5 Chemistry 5616 

6 Engineering 5622 


7 Computing 5630 

8 Social Sciences 5639 

20 Links 5645 

1 Exact Sciences 5646 

2 Publishing/Magazines 5647 

3 Associations 5649 

4 Jobs 5650 

5 Television/Radio 5650 

6 Other sciences 5651 

7 Softwares/Applications 5652 

21 Quotes 5655 

22 Change Log 5662 

23 Nomenclature 5680 

List of Figures 5684 

List of Tables 5719 

List of Algorithms 5725 

Bibliography 5727 

Index 5733 

24 Donate 5783 


Dedicated to Mother Nature 



1 Impressum 3 

1.1 Use of content 3 

1.2 How to use this book 4 

1.3 Data Protection 7 

1.4 Use of data 7 

1.5 Data transmission 7 

1.6 Agreement 7 

1.7 Errata 7 

2 License 8 

2. 1 Preamble 8 

2.2 Applicability and Definitions 9 

2.3 Verbatim Copying 10 

2.4 Copying in Quantity 10 

2.5 Modifications 11 

2.6 Combining Documents 12 

2.7 Collections of Documents 13 

2.8 Aggregation with independant Works 13 

2.9 Translation 13 

2.10 Termination 13 

2. 1 1 Future revisions of this License 14 

3 Roadmap 15 



1.1 Use of content 

The contents of this book are elaborated by a development process by which volunteers reach 
a consensus. This process that brings together volunteers, research also the point of view of 
people interested in the topics of this book. The person in charge of this book administers 
the process and establishes rules to promote fairness in the consensus approach. It is also 
responsible for drafting the text, sometime for testing/evaluating or independently verifying the 
accuracy or completeness of the presented information. 

We decline no responsibility for any injury, damage or any other kind, special, incidental, con- 
sequential or compensatory, arising from the publication, application or reliance on the content 
of this book. We make no express or implied warranty on the accuracy or completeness of any 
information published in this book, and do not guarantee that the information contained in this 
book meet any specific need or goal of the reader. We do not guarantee the performance of 
products or services of one manufacturer or vendor solely by virtue of this book content. 

The technical descriptions, procedures, and computer programs in this book have been devel- 
oped without care, therefore they are provide without warranty of any kind. We make also no 
warranties that the equations, programs, and procedures in this books or its associated software 
are free of error, or are consistent with any particular standard of merchantability, or will meet 
your requirements for any particular application. They should not be relied upon for solving a 
problem whose incorrect solution could result in injury to a person or loss of property. Any use 
of the content of this book as at the reader’s own risk. The authors, redactors, and publisher dis- 
claim all liability for direct, incidental, or consequent damages resulting form use of the content 
of this book or the associated software. 

By publishing texts, it is not the intention of this book to provide services on behalf of any 
person or entity or performing any task to be accomplished by any person or entity for the 
benefit of a third party. Anyone using this book should rely on its own independent judgment 
or, where that is appropriate, seek the advice of a qualified expert to determine how to exercise 
reasonable care under all circumstances. The information and standards on the topics covered 
by this book may be available from other sources that the reader may wish to visit in search of 
points of view or additional information not covered by the contents of this book. 

We have no power in order to enforce compliance with the contents of this book, and we do not 
undertake to monitor or enforce such compliance. We have no certification, testing or inspection 
activity of products, designs or installations for safety or health of persons and property. Any 
certification or other statement of compliance regarding information relating to health or safety 
of persons and property, mentioned in this book, cannot possibly be attributed to the content 
of this book and remains under the responsibility of the certification center or the concerned 

1. Warnings 

EAME v3. 5-2013 

1.2 How to use this book 

At the university level, this book can be used for a Ph.D., graduate level or advanced under- 
graduate level seminar in many exact and pure sciences fields. The seminars where we use this 
material is part of Scientific Evolution Sari program, where the trainees typically already have 
taken undergraduate or graduate courses in their respective specialization. In reality this books 
also aims to cover the full Kindergarten to PhD curriculum. 

Because the methods of Applied Mathematics are learned by practice and experience, we view 
a seminar on Applied Mathematics as a learning-by-doing (project oriented) seminar. We struc- 
ture our mathematical modelling seminars around a set of problems that require the trainee 
to construct models that help with planning and decision making. The imperative is that the 
models should be consistent with the theory and back-tested. To fulfill this imperative, it is 
necessary for the trainee to combine mathematical theory with modeling. The result is that the 
trainee learns the theory, and more importantly, learns how that theory is applied and combined 
in the real world. The ability to criticize and identify limitations of dangerous mathematical 
tools is the most valuable feature of our seminars. 

The problems with solutions in this book provide the opportunity to apply the text material to a 
comprehensive set of fairly realistic situations. By the end of the seminars the trainees will have 
enhanced their skills and knowledge of the most important theoretical and computing tools. 
These are valuable skills that are in demand by the businesses at the highest levels. 

It is very difficult to cover all the material in this book in a semester. It takes a lot of time to 
explain the concepts to the trainees. The reader is encouraged to pick and choose which topics 
will be covered during the term. It is not necessary strictly necessary to cover them in sequence 
but it can help in a significant way? 

In a nutshell, this book offers you a wide variety of topics that are amenable to modeling. All 
are practical. 

1.2.1 Ancilliaries 

We offer an array of ancilliaries for students, instructors and practitioners. 

First there are some free companion eBooks and tools in French and English written by Vincent 
ISOZ & Daname KOLANI for the people that want to put in practice the theory presented in 
this book. 

Here is the list: 

• MATLAB™ in English (1,339 pages): 

• Maple in French (99 pages): 

• R in French (1,626 pages): 


info @ sciences. ch 

EAME v3. 5-2013 

1. Warnings 

http ://www. sciences .ch/dwnldbl/divers/R.pdf 

• Minitab in French (1,092 pages): 

• Scientific Linux installation & Configuration (211 pages): 

In second we offer a few Quizzes and Flashcards in French and English to challenge your 
students or just yourself with the rest of the world: 

• MATLAB™ Basics LI Challenge level in French (100 questions) start_session/a73647cf 3b/ 

• Astronomy /Astrophysics HI Challenge level in English (100 questions): start_session/ffd0810f aO/ 

• Greek Letter Flashcards (48 cards): 

http: / /www . scientific- evolution. com/qcm/f r/start_session/6d9f If ef 90/ 

• Common Derivatives Flashcards (29 cards): start_session/ cl5a40f 2c4/ 

• Common Primitives Flashcards (60 cards): start.session/ ccf c20fdef / 

• Common Trigonometric Identities Flashcards (68 cards): start_session/882f 9696cd/ 

• DTpX L3 Challenge level in French (100 questions): 

http: / /www . scientific- evolution. com/qcm/f r/ start _session/f fie ldlb91/ 

• R Software 3.1.2 L3 Challenge level in French (100 questions): start_session/2a6f ca7473/ 

• C++ L3 Challenge level in French (100 questions): start_session/e031ce4b43/ 

And as any technical book should have a forum, the reader ca go through this link for any 
discussions about the content of this book: 

info @ sciences. ch 


1. Warnings 

EAME v3. 5-2013 

https : //www . physicsf orums . com 

For those who prefer social networks we have also a dedicated Facebook group: 


Or for more fun (science pics, quotes, jokes, videos, etc.) there is also an associated Instagram 

© http s : //www. instagram. com/opera. magi stri s/ 

And a collection of a selection of what we consider a interesting scientific videos on our 
YouTube channel: 


As for this book, the companion books above are only samples of the complete one. The full 
version with perpetual free updates are available for the price of $ 299.- each and for $ 499.- 
you get the exercise files and ETgX sources (for information on purchase you can simply send 
me an email). 

Because this book mainly focus on mathematical aspect of physical phenomena we can only 
strongly recommend to the reader an another free book that is in our point of view actually the 
best one that focus on the popular science aspect of the subjects that we will cover: 

Motion Mountain by Dr. Christoph Schiller: http : / /www 


info @ sciences. eh 

EAME v3. 5-2013 

1. Warnings 

1.3 Data Protection 

When looking at information on the Internet companion site (, some data are au- 
tomatically saved. We try to save as less as possible data and as brief as possible. Wherever 
we can, we ave only anonymous data. We undertake to process the data you send us personally 
with the utmost diligence. 

However, your IP address and the source page that takes you on and the associated 
keywords, are freely available to everybody here for the current month. After which detailed 
data are destroyed. You can object at any time in the publication of your data by contacting us. 

1.4 Use of data 

Your data are only used for sending the newsletter. Communication of personal 
data (except the e-mail address, title and name) is optional. When registering for the newsletter, 
you can of course specify an alternate address and/or a fictitious name. 

1.5 Data transmission 

We will never sell or commercialize the data of our customers or interested parties and will 
never affects the rights of the person. In addition, we will not rent mailing lists and will not 
send you advertising from third parties or on our behalf. 

1.6 Agreement 

When you provide us personal information, you authorize us to save them and use them within 
the meaning of the Swiss Federal Law on Data Protection. If you ask us not to send you emails, 
we are obliged, in your interest, save your e-mail in an internal negative list. 

1.7 Errata 

Altought we have taken every care to ensure the accuracy of our content, mistakes do happen. If 
you find a mistake in this boo - maybe a mistake in the text, scripts or illustrations - we would be 
grateful if you would report this to us. By doing so, you can save other readers from frustration 
and help us improve subsequent versions of this book. Our e-mail is given on the footer every 
page of this book. Once your errata are verified, your submissions will be accepted and the 
error will be visible on the change log of update versions. 

info @ 



The entire contents of this book is subject to the GNU Free Documentation License, which 

• that everyone has the right to freely use the texts for non-commercial usage (Google Ads 
or any equivalent being considered as a commercial usage!) 

• that any person is authorized to broadcast items for non-commercial usage (Google Ads 
or any equivalent being considered as a commercial usage!) 

• that anyone can freely edit the texts for non-commmercial usage (Google Ads or any 
equivalent being considered as a commercial usage!) 

and bla bla bla... 

in accordance with the license described below: 

Version 1.1, March 2000 

Copyright (C) 2000 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 
0211 1-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license 

document, but changing it is not allowed. 

2.1 Preamble 

The purpose of this License is to make a manual, textbook, or other written document "free" in 
the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with 
or without modifying it only a non-commercial purpose. Secondarily, this License preserves 
for the author and publisher a way to get credit for their work, while not being considered 
responsible for modifications made by others. 

This License is a kind of "copyleft", which means that derivative works of the document must 
themselves be free in the same sense. It complements the GNU General Public License, which 
is a copyleft license designed for free software. 

We have designed this License in order to use it for manuals for free software, because free 
software needs free documentation: a free program should come with manuals providing the 
same freedoms that the software does. But this License is not limited to software manuals; it 
can be used for any textual work, regardless of subject matter or whether it is published as a 
printed book. We recommend this License principally for works whose purpose is instruction 
or reference. 

EAME v3. 5-2013 

1. Warnings 

2.2 Applicability and Definitions 

This License applies to any manual or other work that contains a notice placed by the copyright 
holder saying it can be distributed under the terms of this License. The "Document", below, 
refers to any such manual or work. Any member of the public is a licensee, and is addressed as 

A "Modified Version" of the Document means any work containing the Document or a portion 
of it, either copied verbatim, or with modifications and/or translated into another language. 

A "Secondary Section" is a named appendix or a front-matter section of the Document that deals 
exclusively with the relationship of the publishers or authors of the Document to the Document’s 
overall subject (or to related matters) and contains nothing that could fall directly within that 
overall subject. (For example, if the Document is in part a textbook of mathematics, a Secondary 
Section may not explain any mathematics.) The relationship could be a matter of historical 
connection with the subject or with related matters, or of legal, commercial, philosophical, 
ethical or political position regarding them. 

The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being 
those of Invariant Sections, in the notice that says that the Document is released under this 

The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or 
Back-Cover Texts, in the notice that says that the Document is released under this License. 

A "Transparent" copy of the Document means a machine-readable copy, represented in a for- 
mat whose specification is available to the general public, whose contents can be viewed and 
edited directly and straightforwardly with generic text editors or (for images composed of pix- 
els) generic paint programs or (for drawings) some widely available drawing editor, and that 
is suitable for input to text formatters or for automatic translation to a variety of formats suit- 
able for input to text formatters. A copy made in an otherwise Transparent file format whose 
markup has been designed to thwart or discourage subsequent modification by readers is not 
Transparent. A copy that is not "Transparent" is named "Opaque". 

Examples of suitable formats for Transparent copies include plain ASCII without markup, Tex- 
info input format, LaTeX input format, SGML or XML using a publicly available DTD, and 
standard-conforming simple HTML designed for human modification. Opaque formats include 
PostScript, PDF, proprietary formats that can be read and edited only by proprietary word pro- 
cessors, SGML or XML for which the DTD and/or processing tools are not generally available, 
and the machine- generated HTML produced by some word processors for output purposes only. 

The "Title Page" means, for a printed book, the title page itself, plus such following pages as 
are needed to hold, legibly, the material this License requires to appear in the title page. For 
works in formats which do not have any title page as such, "Title Page" means the text near the 
most prominent appearance of the work’s title, preceding the beginning of the body of the text. 

info @ sciences. ch 


1. Warnings 

EAME v3. 5-2013 

2.3 Verbatim Copying 

You may copy and distribute the Document in any medium, noncommercially, provided that 
this License, the copyright notices, and the license notice saying this License applies to the 
Document are reproduced in all copies, and that you add no other conditions whatsoever to 
those of this License. You may not use technical measures to obstruct or control the reading or 
further copying of the copies you make or distribute. However, you may accept compensation 
in exchange for copies. If you distribute a large enough number of copies you must also follow 
the conditions in section 3. 

You may also lend copies, under the same conditions stated above, and you may publicly display 

2.4 Copying in Quantity 

If you publish printed copies of the Document numbering more than 100, and the Document’s 
license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly 
and legibly, all these Cover Texts: Lront-Cover Texts on the front cover, and Back-Cover Texts 
on the back cover. Both covers must also clearly and legibly identify you as the publisher 
of these copies. The front cover must present the full title with all words of the title equally 
prominent and visible. You may add other material on the covers in addition. Copying with 
changes limited to the covers, as long as they preserve the title of the Document and satisfy 
these conditions, can be treated as verbatim copying in other respects. 

If the required texts for either cover are too voluminous to fit legibly, you should put the first 
ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent 

If you publish or distribute Opaque copies of the Document numbering more than 100, you 
must either include a machine-readable Transparent copy along with each Opaque copy, or 
state in or with each Opaque copy a public ly-accessible computer-network location containing a 
complete Transparent copy of the Document, free of added material, which the general network- 
using public has access to download anonymously at no charge using public-standard network 
protocols. If you use the latter option, you must take reasonably prudent steps, when you begin 
distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus 
accessible at the stated location until at least one year after the last time you distribute an Opaque 
copy (directly or through your agents or retailers) of that edition to the public. 

It is requested, but not required, that you contact the authors of the Document well before 
redistributing any large number of copies, to give them a chance to provide you with an updated 
version of the Document. 


info @ sciences. ch 

EAME v3. 5-2013 

1. Warnings 

2.5 Modifications 

You may copy and distribute a Modified Version of the Document under the conditions of 
sections 2 and 3 above, provided that you release the Modified Version under precisely this 
License, with the Modified Version filling the role of the Document, thus licensing distribution 
and modification of the Modified Version to whoever possesses a copy of it. In addition, you 
must do these things in the Modified Version: 

• Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, 
and from those of previous versions (which should, if there were any, be listed in the 
History section of the Document). You may use the same title as a previous version if the 
original publisher of that version gives permission. 

• List on the Title Page, as authors, one or more persons or entities responsible for au- 
thorship of the modifications in the Modified Version, together with at least five of the 
principal authors of the Document (all of its principal authors, if it has less than five). 

• State on the Title page the name of the publisher of the Modified Version, as the publisher. 

• Preserve all the copyright notices of the Document. 

• Add an appropriate copyright notice for your modifications adjacent to the other copyright 

• Include, immediately after the copyright notices, a license notice giving the public per- 
mission to use the Modified Version under the terms of this License, in the form shown 
in the Addendum below. 

• Preserve in that license notice the full lists of Invariant Sections and required Cover Texts 
given in the Document’s license notice. 

• Include an unaltered copy of this License. 

• Preserve the section entitled "History", and its title, and add to it an item stating at least 
the title, year, new authors, and publisher of the Modified Version as given on the Title 
Page. If there is no section entitled "History" in the Document, create one stating the title, 
year, authors, and publisher of the Document as given on its Title Page, then add an item 
describing the Modified Version as stated in the previous sentence. 

• Preserve the network location, if any, given in the Document for public access to a Trans- 
parent copy of the Document, and likewise the network locations given in the Document 
for previous versions it was based on. These may be placed in the "History" section. You 
may omit a network location for a work that was published at least four years before the 
Document itself, or if the original publisher of the version it refers to gives permission. 

• In any section entitled "Acknowledgements" or "Dedications", preserve the section’s title, 
and preserve in the section all the substance and tone of each of the contributor acknowl- 
edgements and/or dedications given therein. 

• Preserve all the Invariant Sections of the Document, unaltered in their text and in their 
titles. Section numbers or the equivalent are not considered part of the section titles. 

info @ sciences. ch 


1. Warnings 

EAME v3. 5-2013 

• Delete any section entitled "Endorsements". Such a section may not be included in the 
Modified Version. 

• Do not retitle any existing section as "Endorsements" or to conflict in title with any In- 
variant Section. 

• If the Modified Version includes new front-matter sections or appendices that qualify as 
Secondary Sections and contain no material copied from the Document, you may at your 
option designate some or all of these sections as invariant. To do this, add their titles to 
the list of Invariant Sections in the Modified Version’s license notice. These titles must 
be distinct from any other section titles. 

• You may add a section entitled "Endorsements", provided it contains nothing but endorse- 
ments of your Modified Version by various parties-for example, statements of peer review 
or that the text has been approved by an organization as the authoritative definition of a 

• You may add a passage of up to five words as a Front-Cover Text, and a passage of up 
to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified 
Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added 
by (or through arrangements made by) any one entity. If the Document already includes 
a cover text for the same cover, previously added by you or by arrangement made by the 
same entity you are acting on behalf of, you may not add another; but you may replace 
the old one, on explicit permission from the previous publisher that added the old one. 

• The author(s) and publisher(s) of the Document do not by this License give permission 
to use their names for publicity for or to assert or imply endorsement of any Modified 

2.6 Combining Documents 

You may combine the Document with other documents released under this License, under the 
terms defined in section 4 above for modified versions, provided that you include in the combi- 
nation all of the Invariant Sections of all of the original documents, unmodified, and list them 
all as Invariant Sections of your combined work in its license notice. 

The combined work need only contain one copy of this License, and multiple identical Invariant 
Sections may be replaced with a single copy. If there are multiple Invariant Sections with the 
same name but different contents, make the title of each such section unique by adding at the 
end of it, in parentheses, the name of the original author or publisher of that section if known, 
or else a unique number. Make the same adjustment to the section titles in the list of Invariant 
Sections in the license notice of the combined work. 

In the combination, you must combine any sections entitled "History" in the various original 
documents, forming one section entitled "History"; likewise combine any sections entitled "Ac- 
knowledgements", and any sections entitled "Dedications". You must delete all sections entitled 


info @ sciences. ch 

EAME v3. 5-2013 

1. Warnings 

2.7 Collections of Documents 

You may make a collection consisting of the Document and other documents released under 
this License, and replace the individual copies of this License in the various documents with a 
single copy that is included in the collection, provided that you follow the rules of this License 
for verbatim copying of each of the documents in all other respects. 

You may extract a single document from such a collection, and distribute it individually under 
this License, provided you insert a copy of this License into the extracted document, and follow 
this License in all other respects regarding verbatim copying of that document. 

2.8 Aggregation with independant Works 

A compilation of the Document or its derivatives with other separate and independent docu- 
ments or works, in or on a volume of a storage or distribution medium, does not as a whole 
count as a Modified Version of the Document, provided no compilation copyright is claimed for 
the compilation. Such a compilation is named an "aggregate", and this License does not apply 
to the other self-contained works thus compiled with the Document, on account of their being 
thus compiled, if they are not themselves derivative works of the Document. 

If the Cover Text requirement of section 3 is applicable to these copies of the Document, then 
if the Document is less than one quarter of the entire aggregate, the Document’s Cover Texts 
may be placed on covers that surround only the Document within the aggregate. Otherwise they 
must appear on covers around the whole aggregate. 

2.9 Translation 

Translation is considered a kind of modification, so you may distribute translations of the Doc- 
ument under the terms of the corresponding section about transformation. Replacing Invariant 
Sections with translations requires special permission from their copyright holders, but you may 
include translations of some or all Invariant Sections in addition to the original versions of these 
Invariant Sections. You may include a translation of this License provided that you also include 
the original English version of this License. In case of a disagreement between the translation 
and the original English version of this License, the original English version will prevail. 

2.10 Termination 

You may not copy, modify, sublicense, or distribute the Document except as expressly provided 
for under this License. Any other attempt to copy, modify, sublicense or distribute the Docu- 
ment is void, and will automatically terminate your rights under this License. However, parties 
who have received copies, or rights, from you under this License will not have their licenses 
terminated so long as such parties remain in full compliance. 

info @ sciences. ch 


1. Warnings 

EAME v3. 5-2013 

2.11 Future revisions of this License 

The Free Software Foundation may publish new, revised versions of the GNU Free Documenta- 
tion Ficense from time to time. Such new versions will be similar in spirit to the present version, 
but may differ in detail to address new problems or concerns. See 

Each version of the Ficense is given a distinguishing version number. If the Document speci- 
fies that a particular numbered version of this Ficense "or any later version" applies to it, you 
have the option of following the terms and conditions either of that specified version or of any 
later version that has been published (not as a draft) by the Free Software Foundation. If the 
Document does not specify a version number of this Ficense, you may choose any version ever 
published (not as a draft) by the Free Software Foundation. 

Please consider the environment before 



info @ sciences. ch 


This book has a simple progression rule that is: 1 new A4 page by day since May 2001 on 
subjects that interest the supervisor of the Sciences. ch distribution of the book Opera Magistris. 
The following subjects below are already planned for a near of far future still with the same 
level of details and pedagogical approach in the proofs: 

• Probabilites: 

- Baysian conjugation for Normal and Binomial law 

• Statistics: 

- Mode and Median of statistical laws 

- Semi-variance 

- Partial and semi-partial correlation 

- M-Estimators for localization and for dispersion 

- Likelihood of censored data 

- Jensen Inequality 

- Normal Law Entropy 

- Maximum likelihood Test 

- Propension score 

- Equivalence test 

- Quasi-correlation matrix 

- Factorial Analysis 

- Hotelling T-Test 

- Welch Test with Welch-Satterhwaite equation 


- Wald-Wolfowitz Test (binary sequence) 

- Levene-Wolfwitz Test 1 (continuous up/down sequence) 

- Odds Ratio and its confidence interval 

- Risk Ratio and its confidence interval 

- Ellipse of control 

- Poisson Model for the average (2D) spatial distance 

- Canonical Correlation 

'also named "turning point test" or "trend test" 

1. Warnings 

EAME v3. 5-2013 

- Intraclass correlation coefficient (ICC) 

- G-test of periodicity 

- Gaussian and Student copula 

- Hierarchical Fixed Factor ANOVA 

- Square Latin ANOVA without replication 

- Introduction to M ANOVA 

- Extreme Values Theorem 

- Survey Theory 

• Sequences and Series: 

- Properties of Fourier transforms 

- Laplace Transform 

- Z transform (common Z transforms, inverse common Z transforms) 

• Differential Calculus 

- O.D.E. classification 

- Lebesgue Integral with numerical application in MATLAB™ 

- Laplace Method 

- Continuous and Discrete Convolution 

• Functional Analysis: 

- Convexity and Concavity of a function 

• Complex Analysis: 

- Residue Theorem for polynomial ratios 

• Topology: 

- Mahalanobis Distance 

• Analytical Geometry: 

- Classification of ellipses with the determinant 

• Differential Geometry: 

- Normal coordinates 

- Gauss curvature 

- Isoperimetric plane theorem 

• Mechanics: 

- Magnus effect 

• Optical Wave: 


info @ sciences. eh 

EAME v3. 5-2013 

1. Warnings 

- Fresnel Diffraction 

- Fraunhoffer Diffraction 

• Astronomy: 

- MacCullagh’s formula 

- Body flatness indirect calculation 

- Syncronous locking of tidally evolving satellites 

• General Relativity: 

- Real volume of an object in General Relativity 

- Einstein radius derivation 

- Gravitational Waves 

• Cosmology: 

- Friedmann-Lemaitre metric derivation 

• Chemistry: 

- Molecular Rotational Energy and Electron Transitions 

- Vibrational Energy of Molecules 

- Vibrational plus Rotational Energy of Molecules 

• Numerical Methods: 

- Univariate optimization problem with substitution method 

- Acceptation/Rejection Sampling 

- Gibbs Sampling 

- Outliers vs Influential values 

- Generali z ed Linear Models (Gauss, Poissson, Negative Binomial, Gamma) 

- Logistic regression based on maximum likelihood 

- Cronbach coherence indicator 

- Linear discriminant Analysis 

- Quadratic discriminant Analysis 

- Multidimensional scaling (MDS) 

- Linear Mixture Model (LMM) 

- Kernel Smoothing 

- Mean Shift 

- PLS Regression 

- Factorial Analysis 

- Correspondence Factorial Analysis 

- GRG Generalized Reduced Gradient (GRG) optimization method 

info @ sciences. eh 


1. Warnings 

EAME v3. 5-2013 

• Mechanical Engineering: 

- Self-buckling (tallest column problem) 

• Industrial Engineering: 

- Box Domains 

- Central Composite Design 

- Center Face Cube Design 

- Cox Survival Model (Cox Proportional Hazard Model) 

- Modelization by Structural Equations 

- Accelerated life testing 

• Electronics: Microelectronics 

• Finance: 

- Continuous Yield rate 

- Zero-Coupon curve rates 

- Equivalence of an obligation rate for a treasure bond 

- Spot rate and Forward rate 

- Adjusting the beta of a portfolio with Futures 

- Cox-Ingersoll Future/Forward price equality 

- Solution of Black & Scholes ODE 

- Black Model 

- Macaulay Duration 

- Modified Duration 

- Modified Internal Rate of Return (MIRR) 

- Binomial Tree (Cox-Ross-Rubinstein) 

- Options Portfolio hedging 

* Protective Put/Call 

* Bull Spread/Call 

* Bear Spread/Call 

* Butterfly 

* Straddle 

* Strangle 

* Collar 

* Calendar spreads 

* Portfolio allocation methods 

• Optimal weighted portfolio for balanced risk 

• Optimal weighted portfolio for error tracking 

• Optimal weighted Sharp’s portfolio 


info @ sciences. ch 

EAME v3. 5-2013 

1. Warnings 

• Optimal weighted portfolio with maximum diversification 

• Optimal market-bench weighted Treynor-Black Portfolio 

- Surplus at Risk (SVaR) 

- Default Credit Risk (based on Standard & Poor rating) 

- VaR Equity Coverage 

- Condition VaR loss (CVaR) 

- Eo kk er- Planck equation 

- ARCH-GARCH stochastic process 

- Vector autoregressive models for multivariate time series 

• Quantitative Management: 

- Gale-Shapley Algorithm 

- Newsvendor problem 

- Bull whip Effect 

- Condorcet paradox 

- Computerized Relative Allocation of Facilities Technique (CRAFT) 

- Real options 

- Procedural Hierarchical Analysis 

- Differed Capital in living case (life assurance) 

- Modified Duration 

- Death differed temporary (life assurance) 

Remember that the ETpX sources of this book can be obtained actually depending on your 
donation on Patreon, Paypal or Tipee. 

As every robust product has a lifecycle. The lifecycle begins when a product is released and 
ends when it’s no longer supported. Knowing key dates in this lifecycle helps you make in- 
formed decisions about when to upgrade. This book has the following lifecycle: a new major 
or minor version is published every 1st of month following the Gregorian Calendar and can be 
downloaded with by clicking on the following button (270MB PDF...): 




or if this link would not work, a copy of the PDF is available on the Internet archive: 

info @ sciences. eh 


1. Warnings 

EAME v3. 5-2013 


To quote this book: 

@ book { OperaMagistris20 1 3 v3 , 

author = {Vincent Isoz and Leon Harmel}, 

title = {Opera Magistris - Elements of Applied Mathematics for Engineers}, 
year = {2014}, 
publisher { Sciences. ch}, 

keywords = {science, physics, maths, engineering, finance, management}, 
isbn= {978239909327}, 



info @ sciences. ch 



The ideas in this book have been developed and reinforced by many people. I have greatly 
benefited from my regular interactions with hundreds of executives from all backgrounds, in- 
cluding CEOs, CFOs, PMs of many companies around the world, teaching essions, developing 
company-specific programs, consulting, and even informal conversations. I am grateful to them 
for sharing their wisdom with me and inspiring many of the ideas in the book 

This book and its companion website would not have been possible without the valuable support 
of the people mentioned below. They find here the expression of my gratitude (and for sure if 
some errors remains in this book this is obviously their fault...): 

• Harmel Leon (f2012), graduate electrical engineer with a specialization in electronics 
and automation, responsible in the physical research laboratory at ACEC in Charleroi 
(BEL), for the provision of documentation that was used in the sections of Corpuscular 
Quantum Physics, Wave Quantum Physics, Quantum Field Theory, Spinor calculus and 
General Relativity. 

• Legrand Mathias, Ph.D. Ecole Centrale de Nantes (FRA) for his help on the redaction 
of the first 550 pages of the ETpX eBook version of the website. 

• Ricchiuto Ruben, engineer degree in Physics HES (B.Sc.) from the Engineering School 
of Geneva (CHE) and mathematician from the University of Geneva for his valuable 
help in plasma physics, electromagnetism, quantum physics, statistics, topology, quantum 
chemistry, fractals theory, analysis and many other areas affecting pure mathematics and 

• Regulars participants to Les and forum, for their 
valuable assistance in many areas of mathematics and physics. The debates and discus- 
sions that took place on the forums helps to constantly improve the educational aspect of 
this book. 

• The Wikipedia and PlanetMath websites to whom I am indebted to many borrow almost 
word by word (and this is mutual...). 

And thanks to all readers, webmasters and teachers for their websites and quality documents 


2. Acknowledgements 

EAME v3. 5-2013 

available for free and anonymously on the Internet and regular forum stakeholders. I sometimes 
verbatim recovered their explanations that do not require additions or corrections. It’s proba- 
bly needless to say that you should not assume that these people are in total agreement with 
the scientific purposes views expressed in this book; and are not responsible for any errors or 
obscurities that you might accidentally find in it. 

Thanks also t few colleagues and customers who were willing to give me their comments to 
improve the content of this book. However, it is certain that it can still be improved on many 

I would like finally to thank especially all of my family for their continued support and my 
friends for their patience as I was almost completely absent, but I would like to send a special 
thanks to my Dad and Mom, for all of her incredibly help and support over the last months of 
translation of this book! I would like also to apologize to some of my customers and colleagues 
because as I answered very slowly to their e-mails and phones during thirteen months to better 
focus on the translation of this book. Thanks also to my girlfriend for always being there to take 
care of me when I forget to take care of myself... 

For any public feedback or comment you can use the guestbook associated to this PDF (for 
questions please use the forum!): 

http : / /www . sciences . ch/htmlen/guestbook . php 
or if you want to do a private feedback or comment you can contact me by email. 


info @ sciences. ch 



1 Forewords 24 

2 Methods 31 

2.1 Descartes’ Method 34 

2.2 Archimedean Oath 35 

2.3 Scientific Publication Rules (SPR) 36 

2.4 Scientific Mainstream Media communication 38 

3 Vocabulary 39 

3. 1 On Sciences 40 

3.2 Terminology 43 

4 Science and Faith 46 


This book who first Edition has been published in 2001 is designed so that 
the knowledge required to read it is as basic as possible. It is not necessary 
to have a Ph.D. to consult it, you just have to know reasoning, to think 
critically, to observe and have time... 

"Simplicity is the seal of truth and it radiates beauty" 



No human endeavor has had more impact than Science 1 on our lifes and our conception of the 
world and ourselves. Its theories, conquests and results are all around us. 

Omnipresent in the industry (aerospace, imaging, cryptography, transportation, chemistry, al- 
gorithmic, etc.) or in the services (banking, fintech, insurance, human resources, projects, lo- 
gistics, architecture, communications, etc.), Applied Mathematics also appears in many other 
areas: surveys, risk modeling, data protection, politics, etc. Applied Mathematics influence our 
lifes (telecommunications, transport, medicine, meteorology, music, project management) and 
contribute to the resolution of current issues: energy, health, environment, climate, optimiza- 
tion, sustainable development, etc. much more than any soft skill techniques or methodology! 
They great success are their fabulous dispersion in the real world and their increasing integration 
in all human and artificial intelligence activities. We are going therefore to a situation where 
mathematicians and engineers will no longer have the monopoly of mathematics, but where 
almost any graduate job position will have to do advanced mathematics. 

As a former student in the field of engineering I have often regretted the absence of a single book 
fairly comprehensive, detailed (without going to the extreme...) and educational if possible free 
(!) and portable (being personally a fan of eBooks...) containing at least a non exhaustive idea 
of the overall program of Applied Mathematics in engineering schools with an overview of what 
is used for real in companies with more intuitive than rigorous proofs but with enough details 
to avoid unnecessary effort to the reader. Also a book that does not require the reader to adopt 
each time a new notation or terminology specific to the author when it is not outright to change 
to a foreign language... and where anyone can suggest improvements or additions (through the 
forum, guest-books or by e-mail). 

I was also frustrated during my studies to have quite often have to swallow "formulas" or "laws" 
supposedly (and wrongly) non-provable or too complicated as my teachers says or even disap- 
pointed by renowned authors books (where developments which are left to the reader or as 
exercise and no real applications are even mention...). In this book predominates the will to 
never confuse the reader with empty sentences like "it is evident that...", "it is easy to prove 
that...", "we leave it to the reader as an exercise...", since all developments are presented in 

'From Latin scientia "knowledge, a knowing, expertness". Itself from sciens (genitive scientis) that means "intel- 
ligent, skilled", present participle of scire that means "to know" probably originally comes from "to separate one 
thing from another, to distinguish" related to scindere "to cut, divide". 

EAME v3. 5-2013 

3. Introduction 

detail. But I’m not a purist of maths! I have only one ambition: to explain the easiest way 

Although I have to admit that prove some mathematical relations presented within the engi- 
neering schools curriculum can not be done because of a lack of time in the official program or 
size limit in a book, I can not accept that a teacher or author tells his students (respectively, his 
readers) that certain laws are non-provable (because most of the time this is not true!) or that 
such or such proof is too complicated without giving a reference (where the student can find the 
information necessary to satisfy his curiosity) or at least a simplified but satisfactory proof. 

Moreover, I think that it is totally archaic today that some teachers continue to ask to their stu- 
dents to take a massive quantity of notes during classes. It would be much more favorable and 
optimal to distribute a course handout containing all the details in order to be able to concen- 
trate on the essentials points with students, that is to say the oral explanations, interpretations, 
understanding, reasoning and practice rather than excessive blackboard copy... Obviously by 
giving a complete course handout some students will be brilliant by their absence but ... it is 
the better! Thus, those who are passionate can deepen subjects at home or at the university 
library, the weak do what they have to do and the rest (struggling students but workers) will 
follow the course given by the teacher to profit to ask questions rather than mindlessly copying 
a blackboard. 

Inspired on a learning model of an American scholar, whose I forgot the name (...), this book 
proposes and imposes the following properties to the reader: discover, memorize, cite, integrate, 
explain, restate, infer, select, use, decompose, compare, interpret, judge, argue, model, develop, 
create, search, reasoning, develop in a clear progressive teaching way to develop the analytic 
skills and openness. 

So, in my mind, this non-exhau stive book (and its associated companion PDFs) must be a 
substitute, free of charge for all students and employees around the World, to many references 
and gaps of the scholar system, allowing any curious student not to be frustrated for many 
years during his academic curriculum. Otherwise, the science of the engineer could have the 
aspect of a frozen science, apart from the scientific and technical developments, a heteroclit 
accumulation of knowledge and especially of formulas which made he considered as a tasteless 
subproduct of mathematics and that brings companies and governments to many false results 
and bad decisions... 

This book has also been designed to meet the needs of executives, both finance as well as 
non-finance managers. Any executive who wants to probe further and grasp the fundamentals 
of strategic finance, strategic marketing or project management engineering and supply chain 
issues will benefit from its lecture. 

This book has also for purpose to describes and explains how our Universe and our World (also 
other "worlds" in our Universe) works in a much more accurate, more complete and detailed 
way than any Holy book. It gives models and quantification methods for the origin of species, 
of galaxies, of planets, of quantum phenomenon, of physics movements, of stellar physics, 
of extreme observable events and also extreme rare events and explains social strategies and 
modem technologies in a mathematical and provable way that everyone can check by himself 
and by exposing every-time the assumptions that any reasonable entity should take care of! 

info @ sciences. ch 


3. Introduction 

EAME v3. 5-2013 

Obviously Applied Mathematics is such an abundant topic that a book of this scale can only 
accommodate the basis. Readers are certainly encourage to go beyond this (see the bibliography 
at the end of the book). 

Now, those who see Applied Mathematics only as a tool (what it also is), or as the enemy of 
religious beliefs, or as a boring school field school, are legion. However, it is perhaps useful 
to recall that, as Galileo said, "the book of nature is written in the language of mathematics" 
(without wishing to do scientism!). It is in this spirit that this book discusses Applied Mathe- 
matics for students in the Natural, Earth and Life sciences, as well as for all those who have an 
occupation related to the various subjects including philosophy or for anyone curious to learn 
about the involvement of science in everyday life. 

The choice to study engineering in this book as a branch of Applied Mathematics comes from 
the fact that the differences between all areas of physics (formerly known as "natural philos- 
ophy") and mathematics are so hardly notable that Fields medal (the highest award today in 
the field of mathematics) was awarded in 1990 to physicist Edward Witten, who used physical 
ideas to prove a mathematical theorem. This trend is certainly not fortuitous, because we can 
observe that all science, since it seeks to achieve a more detailed understanding of the subject 
it studies, always finish its trials in the pure mathematics (the absolute path by excellence ...). 
Thus, we can predict in a far future, the convergence of all the sciences (pure, exact or social) to 
the mathematics for the modelisation techniques (see for example the French PDF ”L’ explosion 
des mathematiques " available in the download page of the companion website). 

It can sometimes seem to us difficult (due to irrational as obscure and unjustified fear of pure 
sciences in a large fraction of our contemporaries) to transmit the feeling of the mathematical 
beauty of nature, its deepest harmony and the well-oiled mechanics of the Universe, to those 
who know only the basics of algebra. The physicist Richard Feynman spoke a day of "two 
cultures": people who have and those who do not have sufficient understanding of mathematics 
to appreciate the scientific structure of nature. It is a pity that mathematics are necessary to 
deeply understand nature and that they also have a bad reputation. For the record, it is claimed 
that a King who asked Euclid to teach him geometry complained about its difficulty. Euclid 
replied, "There is no royal road". Physicists and mathematicians can not convert themselves to 
a different language. If you want to leam about nature, to appreciate its true value, you must 
understand its language. The nature is revealed only in this form and we can not be pretentious 
to the point of asking him to change this fact. 

In the same way, no intellectual discussion will allow you to communicate with a deaf per- 
son what you really feel while listening music. Similarly, all discussion of the world remain 
powerless to transmit an intimate understanding of the nature of those of the "other culture". 
Philosophers and theologians may try to give you qualitative ideas about the Universe. The 
fact that the scientific method (in the full sense of the term) can not convince the world of its 
truth and purity, is perhaps the fact of the limited horizon of some people who imagine that 
the human or another intuitive concept, sentimental or arbitrarily is the center of the Universe 
(anthropocentric principle). 

Of course, in order to share this mathematical knowledge, it may seem paradoxical to increase, 
with our work, the long list of books already available in libraries, in commerce and on the 
Internet. Nevertheless, I must be able to present arguments that justifies the creation of such a 
book (and its associated website) as compared to books such as Feynman, Landau or Bourbaki 


info @ sciences. ch 

EAME v3. 5-2013 

3. Introduction 

and Wikipedia/Wolfram themselves or Khan Academy or OpenStax. So what do I think I can 
add to such a wealth of material? 

1 . The great pleasure that we take to write this book ("keep the hand" and improve our skills) 
and have a detailed high quality compendium of tools for our customers and our students 
(and also all those around the World) for free. 

2. The passion for sharing knowledge for free (battle again "copyright madness" (RIP Aaron 
Swartz!) and without frontiers with a tool of quality as ETgX (at the opposite of Wikipedia 
that mixes ETpX and normal text and the awful and shameful content of Khan Academy 2 ). 

3. Because we can’t wait as there are places in the world where the absence of teaching 
modern science and its methodology takes peoples to have believes that bring them to 
some dangerous and obscure paths. 

4. We want to offer Applied Mathematics in an enjoyable and easy-to-learn manner ("keep 
it simple and stupid" at the opposite of the 9 Landau’s graduate level books), because we 
believe that Applied Mathematics change the way we understand the Universe. 

5. This book was first written in French before (in year 2001) that the French version of 
Wikipedia had good mathematical content and long before Khan Academy or OpenStax 
did even exist. 

6. The quick updates/corrections opportunities (at the opposite of Khan Academy) and col- 
laborations of a free e-book (with associated effective search tools) without having topics 
that disappears (at the opposite of Wikipedia). 

7. The content depending on readers requests/comments and on our interests (at the opposite 
of Khan Academy, OpenStax or Landau books) ! 

8. At the opposite of Scientific publications (PRL or other similar) that sucks because don’t 
give detailed proofs and sometimes turn in an infinite loop in references. 

9. The access to ETgX sources to everybody so nobody need to recreate the wheel and loose 
hundred or thousand of hours on redaction instead of innovation (at the opposite of Lan- 
dau books) ! 

10. Rigorous presentation with simplified detailed proofs of all presented concepts (at the 
opposite of Wikipedia, Khan Academy and OpenStax that focus only of the mathematical 
proofs of undergraduate concepts). 

11. The presentation of many advanced and detailed mathematical tools used in business and 

12. The opportunity for students and teachers to reuse content by copy/paste (at the opposite 
of Khan Academy or Landau Books). 

2 OpenStax has good undergraduate PDF - especially the example in their books - but there are between 40-60% of 
missing proofs and the table of contents of their PDF and also the Index are not interactive... and major issue...: 
the content is limited only to undergraduate subjects 

info @ sciences. ch 


3. Introduction 

EAME v3. 5-2013 

13. Constant and fixed notation (at the opposite of Wikipedia, Khan Academy and OpenStax) 
throughout the book, for mathematical operators, a clear language on all topics (3.C. 
criterion: clear, complete and concise) and focus on the basics to make an important 
pedagogical work on the subjects (at the opposite of Landau’s books). 

14. Gather as much information about pure and exact sciences in one electronic (portable), 
homogeneous and rigorous book (but that don’t go as far as Landau’s books). 

15. Release from all pseudo-truths, only truths that can be proven. 

16. Benefit from the development of teaching methods that use the Internet to search for the 
solution of mathematical problems. 

17. The dramatic improvement of automatic translation software and computing power that 
will make of this book, at least we hope, a reference in the fields of sciences. 

18. and... because Applied Mathematics are beautiful and especially when written in LTpX 
and illustrated (at the opposite of Landau books whose illustrations are quite old and 

And also ... I believe that the results of individual research are the property of humanity and 
should be available to all those who explore anywhere the phenomena of nature. In this way the 
work of each benefit to all, and that is for all humanity that our knowledge cumulates and this 
is the trend that allows Internet. 

I do not hide that my contribution is limited largely to this day to that of a collector who gleans 
his information in the works of masters or publications or from anonymous web pages and 
who completes and argues developments and improved them when this is possible. Lor those 
who would accuse me of plagiarism, they should think on the fact that the theorems presented 
in most non-free books and commercially available have been discovered and written by their 
predecessors and their own personal contribution was also made, like mine, to put all this in- 
formation in a clear and modern form a few hundred years later. In addition, it can be seen as 
doubtful that we ask to pay for access to a culture that is certainly the only truly valid and fair 
one in this world and where there is no patent or intellectual property rights. 

This book also reflects my own intellectual limitations. Although I try to study as much science 
and math fields as possible, it is impossible to master them all. This book shows clearly only 
my own interests and experiences as consultant, but also my strengths and my weaknesses. I 
am responsible for the selection of inputs and, of course, of possible errors and imperfections. 

After attempting a strict (linear) order of presentation of the subject, I decided to arrange this 
book in a more pedagogical (thematic) way and always with practical examples o applications. 
It is in my opinion very difficult to speak of so vast subject in a purely mathematical order in 
only one human life, that is to say, when the concepts are introduced one by one, from those 
already known (where each theory, operator, tools, etc., would not appear before its definition). 
Such a plan would require cutting the book, in pieces that are not more thematic. So I decided 
to present things in a logical order and not in order of need. Thus the reader will encounter, as 
the editor himself, to the extreme complexity of the subject. 


info @ sciences. ch 

EAME v3. 5-2013 

3. Introduction 

The consequences of this choice are the following: 

1. Sometimes it will necessary to admit certain concepts, even to understand later. 

2. It will probably be necessary for the reader to go at least twice throughout the book. At 
the first reading, we apprehend the essential and at the second reading, we understand the 
details (I congratulate this who understand all the subtleties the first time). 

3. You must accept the fact that some topics are repeated and that there are many cross- 
references and complementary remarks. 

Some know that for every theorem and mathematical model, there are almost always several 
methods of proofs. I’ve always tried to choose the one that seemed the most simple (e.g. in 
relativity and quantum physics there is the algebraic and matrix formalism). The objective is to 
arrive at the same result anyway. 

This book being in its draft version, it necessarily has lacks on convergence controls, on continu- 
ity, grammar and others... (which will horrify some readers and mathematicians ...)! However, 
I have avoided (or, otherwise, I indicate it) the usual approximations of physics and the use 
of dimensional analysis, by using it as little as possible. I also try to avoid as much as possi- 
ble subjects with mathematical tools that have not previously been presented and demonstrated 

Finally, this presentation, that can still be improved, is not an absolute reference and contains 
errors. Any comment is welcome. I shall endeavour, as far as possible, to correct the weaknesses 
and make the necessary changes as soon as possible. 

However, while mathematics is accurate and indisputable, theoretical physics (its models), is 
still interpreted in the common vocabulary (but not in the mathematical vocabulary) and its 
conclusions all relative. I can only advise, when you read this book, to read by for yourself 
and not to be subjected to outside influences. You must have a very (very) critical mind, take 
nothing for granted and question everything without hesitation. In addition, the keyword of 
good scientist should be: "Doubt, doubt, doubt ... doubt still, and always checks.". We also 
recall that "nothing that we can see, hear, smell, touch or taste, is what it seems to be", therefore 
do not rely on your daily experience to draw hasty conclusions, be critical, Cartesian, rational 
and rigorous in your development, reasoning and conclusions! 

I want to say to those who would try to find themselves the results of some developments of this 
book, do not worry if they do not success or if they doubt about their competences because of 
the time spent solving an equation or problem: some theories that seem obvious or easy today, 
have sometimes needed several weeks, months, even years, to be developed by mathematicians 
or leading physicists in the past! 

I also tried to ensure that this book is pleasing to the eye and to read through. 

Finally, I have chosen to write this work in the first person plural form: "we". Indeed, the 
mathematical physics is not a science that has been made or has evolve through individual work 
but with intensive collaboration between people connected by the same passion and desire of 
knowledge. Thus, by making use of "we", I would like pay tribute to the dead and missing 

info @ sciences. ch 


3. Introduction 

EAME v3. 5-2013 

scientists, to contemporary and future researchers for the work they will perform in order to 
approach the truth and wisdom. 




8 - 


info @ sciences. ch 


Science is the set of all systematic efforts (scrupulous observations and plausible assumptions 
until the evidence of the contrary) to acquire knowledge about our environment, to organize and 
synthesize them into testable laws and theories, whose main purpose is to explain the "how" of 
things (and NOT the why!) often by a five-step approach: 

— What do we have? 

— Where will we go? 

— What is our goal? 

— Does it fit the data? 

Scientists have to submit their ideas and results to independent verification and replication of 
their peers ("peer-review"). They must abandon or modify their conclusions when confronted 
with more complete or different evidences. The credibility of Science is based therefore on this 
self-correcting mechanism and this is what still makes in the 21st century that Science is not 
the best tool (as we do not know what will exist in the future...) but is has been proven as being 
the best investigation method for truth in comparison for all other actual existing methods or 
beliefs. The history of science shows that this system works very long and very well compared 
to all the others. In each area, progress has been spectacular. However, the system sometimes 
failed and has also to be corrected before small drifts accumulate. 

The downside is that scientists are humans. They have the imperfections of all humans, and 
especially, vanity, pride and conceit. Nowadays, it happens that many people working on the 
same topic for a given time develop a common faith and believe they hold the truth. The leader 
of the faith is the Pope and distills his opinion. The Pope that plays the game, takes his miter 
and his pilgrim’s staff to evangelize his fellow heretics. Until then, this makes smile. But, as 
in real religions, they sometimes annoying to want to expand their opinion to those who do not 
believe. Some of these "churches" do not hesitate to behave like the Inquisition. Those who dare 
to express a different opinion are burned at every opportunity, during conferences, or at their 
place of work. Some young researchers, uninspired, prefer to convert to the dominant religion, 
to become clerics faster rather than innovative researchers or even iconoclasts. The great Pope 
write his Bible to disseminate his ideas, imposes it to read to students and newcomers. He 
formats then the thought of younger generations and ensures his throne. This is a medieval 
attitude that can block progress. Some Popes go so far that they believe be the pope in their 
specialization field automatically gives them the same throne in all other areas... 

This warning, and the reminders that will follow, must serve the scientific to ask himself by 
making good use of what we consider today as the good working practices (we will discuss 
the principles of the Descartes method more below) to solve problems or develop theoretical 

3. Introduction 

EAME v3. 5-2013 

For this purpose, here is a summary table that provides the steps that should be followed by a 
scientific who works in mathematics or theoretical physics (for definitions, see just below): 



1. Expose formally or in common language 
the "hypothesis", the "conjecture" the "prop- 
erty" to prove (hypothesis are denoted HE, 
H2., etc. the conjectures CJ1., CJ2., etc. and 
the properties PE, P2., etc.). 

1. Expose correctly in a formally or common 
language all the details of the "problems" to 
solve (problems are denoted Pl., P2., etc.). 

2. Define the "axioms" (non-demonstrable, 
independent and non-contradictory) that 
will give the starting points and establish 
restrictions on development (the axioms are 
denoted Al., A2, etc.) 3 . 

In the same vein, the mathematicians 
defines the specialized vocabulary related 
to mathematical operators which will be 
denoted by Dl., D2., etc. 

2. Define (or state) the "postulates" or "prin- 
ciples" or the "hypothesis" and "assump- 
tions" (supposedly unprovable...) that will 
give the starting point and establish restric- 
tions on the developments (typically, as- 
sumptions and principles are denoted Pl., 
P2., etc. and assumptions HE, H2., etc. try- 
ing to avoid the notation confusion between 
postulates and principles) 4 . 

3. Once the Axioms laid, pull directly "lem- 
mas" or "properties" whose validity follows 
directly and prepare the development of the- 
orem supposed to validate departure hypoth- 
esis or conjectures (Lemmas being denoted 
LE, L2., etc. and properties Pl., P2., etc.). 

3. Once the "theoretical model" devel- 
oped, check equations units for possible er- 
rors in the developments (such checks being 
marked VAl., VA2., etc.). 

4. Once the "theorems" (noted Tl., T2., 
etc.) prooved conclude on "consequences" 
(denoted Cl., C2., etc.) and even properties 
(noted Pl., P2., etc.). 

4 . Search for borderline cases (including 
"singularities") of the model to verify the 
validity intuitively (these borderline controls 
are denoted CL1., CL2., etc.). 

5. Test the strength (robustness) or use- 
fulness of the conjectures or hypothesis by 
proving the reciprocal of the theorem or 
by comparing them with other examples of 
mathematical well-know theories to see if 
form together a coherent structure (examples 
being denoted E1., E2., etc.). 

5. Experimentally test the theoretical model 
obtained and submit work to compare with 
other independent research teams. The new 
model should provide experimental results 
and never observed (predictions to falsify). 
If the model is validated then it is the official 
status of "theory". 

6. Possible remarks may be shown in a hi- 
erarchically structured order and noted R1., 
R2., etc. 

6 . Possible remarks may be shown in a hi- 
erarchically structured order and noted R1., 
R2., etc. 

Table 3.1 - Methodology for Maths & Physics Developments 

Proceed as in the above table is a possible working basis for people working in mathematics 
and physics. Obviously, proceed cleanly and traditionally as above takes a little more time than 

'Sometimes "properties", "conditions" and "axioms" are confused while the concept of axiom is much more accu- 
rate and profound. 

2 You should not forget, however, that the validity of a model is not dependent on the realism of its assumptions but 
on the conformity of its implications with reality. 


info @ sciences. ch 

EAME v3. 5-2013 

3. Introduction 

doing things no matter how (this is why most teachers do not follow these rules, they don’t have 
enough time to cover the entire course program). 

definitions hypothesis 


theoretical results 

empirical changes in 

observations process hypothesis 

Note also a fun shape of scientific 8 commandments: 

1. The phenomenas you will observe 
And never measures you will falsify 

(attention to the confirmation error: study only phenomena that validate your belief) 

2. Hypothesis you will proposed 
That with experiment you will test 

3. The experiment precisely you will describe 
Because your colleague will reproduce it 

(attention to the narrative discipline trap: the facts will be fitted to the desired results) 

4. With your results 

A theory you will build 

5. Parsimony you will use 

And the simplest hypothesis you will retain 

6. Ultimate truth will never be (epistemic humility) 

And always you will search for the truth 

info @ sciences. ch 


3. Introduction 

EAME v3. 5-2013 

7. From a non-refutable thesis you will refrain 
Because outside of the science it will remain 

8. All failures will be like a success 

Because science can confirm but also invalidate 


Rl. Caution! It is very easy to make new physical theories by just aligning words. This 
is named "philosophy" and the Greeks thought of the atoms in this method. This can lead 
with a lot of luck to a true theory. Against it is much more difficult to make a "predictive 
theory", that is to say with equations that predict the outcome of an experiment. 

R2. What separates mathematics and physics is that in mathematics, the hypothesis is 
always true. Mathematical discourse is not a proof of an external seeking truth, but a 
target of consistency. What should be correct is just the reasoning. 

v_ ! i 1 W 

When these rules are not respected, we speak of "scientific fraud" (which often leads to being 
fired from his job but unfortunately we still not retired the diplomas when it happens). In 
general, scientific fraud itself comes in three main forms: plagiarism, fabrication of data and 
alteration of results unfavourable to the hypothesis, the omission of clear working hypotheses 
and recolted datas. To these frauds we can also add behaviors that pose problems regarding to 
the quality of work or more specifically to ethics, such as those aimed at increasing appearance 
in the production (and through the famous of the scientist) by submitting for example several 
times the same publication with only a few modifications, the omission of conflict of interest, 
the dangerous experiments, the non-conservation of primary data, etc. 

2.1 Descartes’ Method 

Now we present the four principles of the Descartes’ method which, as remind, is considered 
as the first scientific in history by his method of analysis: 

PI. Never accept anything as true that I obviously knew her to be such. That is to say, care- 
fully avoid precipitation and to understand nothing more in my judgments than what 
would appear so clearly and distinctly to my mind, that I had no occasion to doubt. 

P2. Divide each of the difficulties I have to examine into as many parts as possible (scrupulous 
observations and plausible hypothesis until evidence of the opposite), and that would be 
required to resolve them in the best way. 

P3. Driving my thoughts in order, beginning with the simplest objects and easiest to know, to 
go up gradually by degrees to the knowledge of the most compounds, and even assuming 
the order between those who not naturally precede each other. 

P4. Make everywhere so complete enumerations and so general reviews, that I’m sure not to 
omit anything. 


info @ sciences. eh 

EAME v3. 5-2013 

3. Introduction 

2.2 Archimedean Oath 

Inspired by the Hippocratic Oath, a group of students of the Ecole Polytechnique Federale de 
Lausanne in 1990 developed an oath of Archimedes expressing the responsibilities and duties 
of the engineer and technician. It was taken in various versions by other European engineering 
schools and could serve as basic inspiration oath for scientific researchers (even if there are 
some important points missing). 

"Considering the life of Archimedes of Syracuse which illustrated as of Antiquity the ambiva- 
lent potential of the technique, considering the responsibility increasing for the engineers and 
scientists with regard to the men and nature, considering the importance of the ethical problems 
that the technique and its applications raise, today, I pledge following and will endeavour to 
tend towards the ideal which they represent: 

1. I will practice my profession for the good of the people, in the respect of the Human 
Rights and of the Environment. 

2. I will recognize, being as well as possible informed to me, the responsibility for my acts 
and will not discharge me to in no case on others. 

3. I will endeavor to perfect my professional competences. 

4. In the choice and the realization of my projects, I will remain attentive with their context 
and their consequences, in particular from the point of view technical, economic, social, 
ecological... I will pay a detailed attention to the projects being able to have fine soldiers. 

5. I will contribute, in the measurement of my means, to promote equitable relationships 
between humans and to support the development of the countries lower-income group. 

6. I will transmit, with rigor and honesty, with interlocutors chosen with understanding, 
any information important, if it represents an asset for the company or if its retention 
constitutes a danger to others. In the latter case, I will take care that information leads to 
concrete provisions. 

7. I will not let myself dominate by the defense of my interests or those of my profession. 

8. I will make an effort, in the measurement of my means, to lead my company to take into 
account the concerns of this Oath. 

9. I will practice my profession in all intellectual honesty, with conscience and dignity. 

10. I promise it solemnly, freely and on my honor." 

info @ sciences. ch 


3. Introduction 

EAME v3. 5-2013 

2.3 Scientific Publication Rules (SPR) 

It is impossible to have a constructive debate or analysis if the basis material is unusable. Sadly 
still in the 21st century it is easy to found Nobel Price publication that were peer-reviewed and 
that are scientifically unusable. This is why we recall here the basic scientific publication rules 
for a publication be accepted by a real scientific peer-review committee: 

1. Use of LaTeX for the writing of the publication 

2. All redaction files and raw data files must have ISO compliant names 

3. The publication should have a GUID 

4. Put the publication date in the publication 

5. Put the major and minor version of the publication (eg: v3.6 r58) 

6. Put the experiment (development) period date (ISO date format) 

7. Write an abstract 

8. Write an introduction 

9. All measurement units must follow ISO standards 

10. Use the "principle of precaution" (use of conditional) 

11. Use "reactive responses", that is to say the make the confrontations between hypotheses / 
data, hypotheses / facts, hypotheses / observations 

12. Use, when available, "leverage factors" to give substance and credit to the work by making 
reference to other corresponding publication on the same subject 5 

13. Material and Methods should be described in details. For theoretical papers, they should 
provide a link (URL) or reference where the full detailed proof can be found (if detailed 
proof is omitted in the original publication!) 

14. Put high resolution print-screens of charts or photos 

15. Write the results and for experimental data always provide a statistical analysis to show 
if the effect seems significant or not (sample size effect also or fluctuation interval) 

16. Calculate the propagation of errors of measurement instruments 

17. Write the precautional conclusion 

18. Give access to the raw data in a non-proprietary format to the scientific community 

19. Give access to the scripts/code used for data analysis to the scientific community 

5 This also the very important step of "personal review", that is to say a personal analysis of several tens / hundreds 
of scientific publications and that you have made one critical analysis that you use to build your own argument. 


info @ sciences. eh 

EAME v3. 5-2013 

3. Introduction 

20. Give access to the LaTeX sources of the publication to the scientific community 

21. Provide exact version (with minor release) of the softwares used to publish the paper 

22. Put the bibliography with the references 

23. Put the % financial support of each sponsor 

24. Submit the paper to the peer-review committee 

25. List all actors (with position, grade, e-mail) and peer-reviewers (only name for that latter) 
of the paper 

Any publication that doesn’t respect at least one of this rule cannot be considered as a "scien- 
tific" publication! 


Even if is there is a consensus between scientists, a unique oriented study (which can be 
very important) can be used to influence the opinion of mainstream media, governments 
and people. This is why a study must always be done and peer-reviewed by independent 
teams and laboratories. 

V / 

info @ sciences. ch 


3. Introduction 

EAME v3. 5-2013 

2.4 Scientific Mainstream Media communication 

The reader of mainstream media or also social networks must never trust a scientific study if 
the reference and peer-reviewed paper is not given as link. The study must also not be taken 
as absolute by reader if there is a consensus of the scientific community but only on... ONE... 
study. The only way to be almost sure is to read the study itself if it respects the above protocol. 

A typical bad example is a news that was taken by many international mainstream media on the 
Lyme-Borreliose disease as following: 

••••o AIS 4G 


® -1 9 97%*' 


Une simple pommade antibiotique a base 
d'azithromycine a prouve son efficacite 
contre la borreliose de Lyme, une 
affection grave transmise par les tiques, 
selon une etude avec participation suisse 
qui a ete publiee mardi. 

Appliquee durant trois jours 72 heures au 
plus tard apres la morsure de tique, la 
pommade a revele une efficacite de 100%, 
selon des tests realises aupres de 1000 
patients: aucun n'a developpe de 
borreliose de Lyme. 

Dans le meme temps, sept infections se 
sont declarees dans le groupe traite avec 
un placebo, selon les resultats de cette 
recherche parue dans The Lancet. 

Eviter trois semaines d'antibiotiques 

Identifiee pour la premiere fois aux Etats- 
Unis en 1975, la maladie de Lyme, une 
affection d’origine bacterienne, peut 
conduire a de graves complications 
neurologiques et articulaires si elle n'est 

i— \ r rlA4-r»/'f A/> 4* *■% ■ 4* r\ r\ A 4/Mvir»r I /•* 

Figure 3.1 - Swiss TV publication about Lyme-Borreliose treatment the 2017-01-08 (source: RTS App) 

In summary what the "scientific journalist" (humm humm... I think it must be a new intern 
in fact...), of one of the main National Swiss Television (so a TV that has enough money to 
investigate correctly any news... at least in theory... in a country that assess to be number one in 
almost everything...), has published is a very bad (catastrophic) interpretation of the real article. 
The above article report that: "...a treatment applied during 3 days not later than 72 hour after 
after the bite of the tick has revealed and efficiency of 100%.... 

In reality (if medias did have read the publication until the end...) the study was stopped after 8 
weeks and it has been shown that the treatment has no better effect than a placebo... 


info @ sciences. ch 


Physics and mathematics, like any field of specialization, has its own vocabulary. So that the 
reader is not lost in the understanding of certain texts he can read in this PDF, we have chosen 
to present here a few fundamentals words, abbreviations and definitions to know. 

Thus, the mathematician like to finish his proofs (when he thinks they are correct) by the abbre- 
viation "Q.E.D." which means "Quod Erat Demonstrandum" (this is Latin). 

And during definitions (they are many in math and physics ...) scientist often use the following 

• ... it is sufficient that ... 

• ... if and only if ... 

• ... necessary and sufficient ... 

• ... means ... 

• ... prove it ... 

These four are not equivalent (identical in the strict sense). Because "it is sufficient that" cor- 
respond to a sufficient condition, but not to a necessary condition. Also it must be notice that 
these four are place in the context of data analysis, data accuracy, reproduction and peer-review 
and not on any personal or common belief or also emotional aspect of a group of people (even 
if this group of people is more than a few billion individuals...)! 

3. Introduction 

EAME v3. 5-2013 


Real, emm 

V THAT / , 




If fa BARTH 

3.1 On Sciences 

It is important that we define rigorously the different types of sciences to which humans often 
refers. Indeed, it seems that in the 21st century a misnomer is established and that it became 
impossible for people to distinguish the "intrinsic quality" between a "science" and another one. 


Etymologically, the word "science" comes from the Latin "Scienta" (knowledge) whose 
root is the verb "scire" which means "to know". 

V / 

This abuse of language is probably the fact that pure and accurate sciences lose their illusions 
of universality and objectivity, in the sense that they are self-correcting. This has for effect that 
some sciences are relegated to the background and try to borrow these methods, principles and 
origins to create confusion. We must therefore be very careful about the claims of scientificity 
in the human sciences, and this is also (or especially) true for the dominant trends in economics, 
sociology and psychology. Quite simply, the issues addressed by the human sciences are ex- 
tremely complex, poorly reproducible, and empirical arguments supporting their theories are 
often quite low. 

By itself, however, science does not produce absolute truth. By principle, a scientific theory 
is valid as long as it can predict measurable and reproducible results. But the problems of 
interpretation of these results are part of natural philosophy. 

Given the diversity of phenomena to be studied, over the centuries there has been a growing 
number of disciplines such as chemistry, biology, thermodynamics, etc. All these disciplines 


info @ sciences. eh 

EAME v3. 5-2013 

3. Introduction 

that are a priori heterogeneous have common foundation physics, for language mathematics and 
for elementary principle the scientific method. 

Thus, a small memory refresh seems useful: 

Definitions (#1): 

D1 . We define as "pure science" any set of knowledge based on rigorous reasoning valid what- 
ever the (arbitrary) elementary factor selected (when we say then "independent of sensible 
reality") and restricted to the minimum necessary. Only mathematics (often named the 
"queen of sciences") can be classified in this category. 

D2. We define as "exact science" or "hard science", any set of knowledge based on the study 
of an observation, observation that has been transcribed in symbolic form and that can 
be reproduce (theoretical physics for example... sometimes...). Primarily, the purpose of 
exact sciences is not to explain the "why" but the "how". 

And never forget... Science (especially physics) doesn’t have to "make sense" it just has 
to make all the right, testable predictions! 

According to the philosopher Karl Popper, a theory is scientifically acceptable if, as presented, it 
can be "falsifiable" (synonyms are "refutable" or "testable"), i.e. subjected to experimental tests 
(or if it is possible to conceive of an observation or an argument which negates the statement in 
question). The "scientific knowledge" is then by definition the set of theories that have resisted 
to falsification. Science is by nature subject to continuous questioning. 

Caution! There is no doubt that the exact sciences have yet an enormous prestige, even among 
their opponents because of their theoretical and practical success. It is certain that some scien- 
tists sometimes abuse of this prestige by showing a sense of superiority that is not necessarily 
justified. Moreover, it often happens that this same scientists exposed in the popular literature, 
very speculative ideas as if they were very approved, and extrapolate their results outside the 
context in which they were tested (and ... under the hypotheses they were checked once...). 

D3. We define as "engineering science" any set of knowledge or practices applied to the needs 
of human society such as electronics, chemistry, computer science, telecommunications, 
robotics, aerospace, biotechnology... 

D4. We define as "science" any body of knowledge based on studies or observations of events 
whose interpretation has not yet been transcribed and verified with mathematical rigour, 
characteristic of previous sciences, but using comparative statistics. We include in this 
definition: medicine (we should however be careful because some parts of medicine are 
studying phenomena using mathematical descriptions such as neural networks or other 

info @ sciences. ch 


3. Introduction 

EAME v3. 5-2013 

phenomena associated with known physical causes), sociology, psychology, history, biol- 
ogy, etc. 

Some teacher like to play with the word "science" as the acronym of (that’s not stupid for 
college students): Solve, Create, Investigate, Evaluate, Notice, Classify, Experiment. 

D5. We define as "soft science" or "para-science", any set of knowledge or practices that are 
currently based on non- verifiable facts (not scientifically reproducible) by experience or 
by mathematics. We include in this definition: astrology, theology, paranormal (which 
was demolished by zetetic science), graphology... 

As some scientists say: «It looks like science, it use the vocabulary of science... but that’s 
not science at all.» 

D6. We define as "phenomenological science" or "natural sciences", any science which is not 
included in the above definitions (history, sociology, psychology, zoology, biology, ...) 

D7. "Scientism" is an ideology that considers experimental science is the only valid mode 
of knowledge, or, at least, superior to all other forms of interpretation in the world. In 
this perspective, there is no philosophical, religious or moral truths superior of scientific 
theories. Only account what is scientifically proven. 

D8. "Positivism" is a set of ideas that considers that only the analysis and understanding of 
facts verified by experience can explain the phenomena of the sensible world. Certainty 
is provided solely by the scientific experiment. He rejects introspection, intuition and 
metaphysical approach to explain any knowledge of the phenomena. 

What is interesting about this doctrine is that it is certainly one of the few that re- 
quires people to have to think for themselves and to understand the environment around 
them by continually questioning everything and by never accepting anything as granted 
(...). In addition, the real sciences have this extraordinary property that they give the 
possibility to understand things beyond what we can see. 

But, science is science, and nothing more: a certain ordering, not too bad success, things that 
no longer leads to the metaphysics as the time of Aristotle, but that does not pretend to give us 
the whole story on reality or even the bottom of visible things. 


info @ sciences. ch 

EAME v3. 5-2013 

3. Introduction 

3.2 Terminology 

The table of methods we presented above contains terms that may perhaps seem unknown or 
barbarians for you. This is why it seems important to provide definitions of these and some 
other equally important that can avoid important confusion. 

Definitions (#2): 

Dl. Beyond its negative sense, the idea of "problem" refers to the first step of the scientific 
method. Formulate a problem is also essential for its resolution and allows to properly 
understand what is the problem and see what needs to be resolved. 

The concept of "problem" is intimately connected to the concept of "assumption" 
which will see the definition below. 

D2. A "hypothesis" is always, in the context of a theory already established or underlying, a 
supposition awaiting confirmation or refutation that attempts to explain a group of facts 
or predict the onset of new facts. 

Thus, a hypothesis can be at the origin of a theoretical problem that has to be 
resolved formally. 

D3. The "postulate" or "assumption" in physics corresponds frequently to a principle (see 
definition below) which admission is required to establish a proof (we mean that this is a 
non-provable proposition). 

The mathematical equivalent (but in a more rigorous version) of the assumption is 
the "axiom" for which we will see the definition below. 

D4. A "principle" (close parent of "postulate") is a proposal accepted as a basis for reasoning 
or a general theoretical guide line for reasoning that needs to be performed. In physics, 
it is also a general law governing a set of phenomena and verified by the accuracy of its 

The word "principle" is used with abuse in small classes or engineering schools 
by teachers not knowing (which is very rare), or unwilling (rather common), or that can’t 
because lack of time (almost exclusively ) prove a relation. 

The equivalent of the postulate or principle in mathematics is the "axiom" which 
we define as follows: 

D5. An "axiom" is a self-evident proposition or truth by itself which admission is necessary 
to establish a proof. 

info @ sciences. eh 


3. Introduction 

EAME v3. 5-2013 


Rl. We could say that this is something we define as the truth for the speech that we 
argue, like a rule of the game, and that it does not necessarily a universal truth value in 
the sensitive world around us. 

R2. Axioms must always be independent (one should not be able to be proved from the 
other) and non-contradictory (sometimes we also say that they must be "consistent"). 

V / 

D6. The "corollary" is a term unfortunately almost nonexistent in physics (wrongly!) and that 
is in fact a proposal resulting from a truth already demonstrated. We can also say that 
a corollary is and obvious and necessary consequence of a theorem (or sometimes of a 
postulate in physics). 

D7. A "lemma" is a proposal deduce from one or more assumptions or axioms and that for 
which the proof prepares this of a theorem. 

D8. A "conjecture" is a supposition or opinion based on the likelihood of a mathematical 

Many conjectures have as as little similar to lemmas, as they are checkpoints to 
obtain significant results. 

D9. Beyond its weak conjecture sense, a "theory" or "theorem" is a set articulated around a 
hypothesis and supported by a set of facts or developments that give it a positive content 
and make the hypothesis well-founded (or at least plausible in the case of theoretical 

DIO. A "singularity" is an indeterminacy in a calculation That takes the appearance of a division 
by zero. This term is both used in mathematics and in physics. 

Dll. A "proof" is a set of mathematical procedures to follow to prove the result already known 
or not of a theorem. 

D12. If the word "paradox" etymologically means: contrary to common opinion, it is not by 
pure taste for provocation, but rather for solid reasons. A "sophism" meanwhile, is a 
deliberately provocative statement, a false proposition based on an apparently valid rea- 
soning. Thus we speak about the "Zeno’s paradox" when in reality it is only a sophism. 
The paradox is not limited to falsity, but implies the coexistence of truth and falsity, so that 
one can no longer distinguish true and the false. The paradox appears as an unsolvable 
problem an "aporia". 


info @ sciences. ch 

EAME v3. 5-2013 

3. Introduction 


It should be added that the well-knows paradoxes, by the questions they raised, have per- 
mitted significant advances to science and led to major conceptual revolutions in math- 
ematics as in theoretical physics (the paradoxes on sets and on infinity in mathematical, 
and those at the base of relativity and quantum physics). 

V / 

info @ sciences. ch 


Science and Faith 

We will see that in Science, a theory is usually incomplete because it can not fully describe the 
complexity of the real world or because it does not predict what we don’t know (excepted for 
Quantum Physics or General Relativity). It is thus for theories like the Big Bang (see section 
Astrophysics) or the Evolution of species (see sections Populations Dynamics or Decision and 
Games Theory) because they are not reproducible in laboratories under identical conditions. 
But some other theories are so accurate to predict physical phenomena that some people believe 
that mathematics is the nearest language with God (at least for those that believe in a divinity...). 




We should distinguish between different scientific currents: 

• "Realism" is a doctrine where physical theories have the aim to describe reality as it is in 
itself, in its unobservable components. 

• "Instrumentalism" is a doctrine where theories are only tools to predict observations but 
do not describe reality itself. 

• "Fictionalism" is the doctrine where the content repository (principles and postulates) of 
theories is just an illusion, useful only to ensure the linguistic articulation of the funda- 
mental equations. 

EAME v3. 5-2013 

3. Introduction 

Even if today the scientific theories are sponsored by many specialists, alternative theories have 
valid arguments and we can not totally dismiss them. However, the creation of the world in 
seven days as described in the Bible is difficult to accept, and many believers recognize that a 
literal reading of the Bible is not compatible with the current state of our knowledge and that 
is more prudent to interpret it as a parable. If science never provides definitive answer, it is no 
longer possible to ignore it. 

Faith (whether religious, superstitious, pseudo-scientific or other) on the contrary is intended 
to provide absolute truths of a different nature as it is a personal unverifiable belief. In fact, 
one of the functions of religion is to give meaning to the phenomena that can not be explained 
rationally. Progress of knowledge trough science therefore cause sometimes questioning the 
religious dogma. 

Conversely, except try to impose his own faith (which is nothing but a subjective and intimate 
personal conviction ) to others, we must defy the natural temptation to characterize scientifically 
proven fact extrapolations of scientific models beyond their scope. 

The word "science" is, as we have already mentioned above, increasingly used to argue that 
there is a scientific evidence where there is only a belief (some web pages like this proliferate 
always more and more). According to its detractors it is, for example, the case of the movement 
of Scientology (but there are many others). According to them, we should rather speak about 
"occult sciences". 

The occult sciences and traditional sciences exist since antiquity; they consist on a series of 
mysterious knowledge and practices designed to penetrate and dominate the secrets of nature. 
Over the past centuries, they have been progressively excluded from science. The philosopher 
Karl Popper has longly questioned himself about the nature of the demarcation between science 
and pseudoscience. After noticing that it is possible to find observations to confirm almost any 
theory, he proposes a methodology based on falsifiability. A theory must according to him, 
to deserve the adjective "scientific", guarantee the impossibility of some events. It becomes 
therefore refutable, so (and only then) capable of integrating science. It would suffice to observe 
any of these events to invalidate the theory, and therefore take the way to improving it. 

And also let us notice that major difference between science books and religion books is that 
if you destroyed that latter, in a thousand year’s time that wouldn’t come back just as it was. 
Whereas if we took every science book and every fact and destroyed them all, in a thousand 
years they’d all be back. Because all the same tests would be the same results. 

info @ sciences. ch 


3. Introduction 

EAME v3. 5-2013 

4.0.1 Baloney detection kit 

Through their training, scientists are equipped with what Carl Sagan name the "baloney detec- 
tion kit" or "bullshit detection kit" that is a set of cognitive tools and techniques that fortify the 
mind against penetration by falsehoods and to draw boundaries between science and pseudo- 
science. It isn’t merely a tool of science, it contains invaluable tools of healthy skepticism that 
apply just as elegantly, and just as necessarily, to everyday life. By adopting the kit, we can all 
shield ourselves against clueless guile and deliberate manipulation. 

There are many version of these detection tool but here is an quite complete one (but still 
incomplete by construction) a proposed by Michael Shermer (founding publisher of <Skeptic 
Magazine and author of The Borderlands of Science): 

1. How reliable is the source of the claim? 

Pseudoscientists often appear quite reliable, but when examined closely, the facts and 
figures they cite are distorted, taken out of context or occasionally even fabricated. Of 
course, everyone makes some mistakes. And as historian of science Daniel Kevles 
showed so effectively in his book The Baltimore Affair, it can be hard to detect a fraudu- 
lent signal within the background noise of sloppiness that is a normal part of the scientific 
process. The question is, Do the data and interpretations show signs of intentional dis- 
tortion? When an independent committee established to investigate potential fraud scru- 
tinized a set of research notes in Nobel laureate David Baltimore’s laboratory, it revealed 
a surprising number of mistakes. Baltimore was exonerated because his lab’s mistakes 
were random and nondirectional... So in science, there are no authorities. At most, there 
are experts ! 

2. Does this source often make similar claims? 

Pseudoscientists have a habit of going well beyond the facts. Flood geologists (creation- 
ists who believe that Noah’s flood can account for many of the earth’s geologic forma- 
tions) consistently make outrageous claims that bear no relation to geological science. Of 
course, some great thinkers do frequently go beyond the data in their creative specula- 
tions. Thomas Gold of Cornell University is notorious for his radical ideas, but he has 
been right often enough that other scientists listen to what he has to say. Gold proposes, 
for example, that oil is not a fossil fuel at all but the by-product of a deep, hot biosphere 
(microorganisms living at unexpected depths within the crust). Hardly any earth scientists 
with whom I have spoken think Gold is right, yet they do not consider him a crank. Watch 
out for a pattern of fringe thinking that consistently ignores or distorts data. 

3. Have the claims been verified by another source? 

Typically pseudoscientists make statements that are unverified or verified only by a source 
within their own belief circle. We must ask, Who is checking the claims, and even who 
is checking the checkers? The biggest problem with the cold fusion debacle, for in- 
stance, was not that Stanley Pons and Martin Fleischman were wrong. It was that they 
announced their spectacular discovery at a press conference before other laboratories ver- 
ified it. Worse, when cold fusion was not replicated, they continued to cling to their claim. 
Outside verification is crucial to good science. 


info @ sciences. ch 

EAME v3. 5-2013 

3. Introduction 

4. How does the claim fit with what we know about how the world works? 

An extraordinary claim must be placed into a larger context to see how it fits. When 
people claim that the Egyptian pyramids and the Sphinx were built more than 10,000 
years ago by an unknown, advanced race, they are not presenting any context for that 
earlier civilization. Where are the rest of the artifacts of those people? Where are their 
works of art, their weapons, their clothing, their tools, their trash? Archaeology simply 
does not operate this way. 

5. Has anyone gone out of the way to disprove the claim, or has only supportive evidence 
been sought? 

This is the "confirmation bias" (we will come back on cognitive bias in the section of 
Decision Theory), or the tendency to seek confirmatory evidence and to reject or ignore 
disconfirmatory evidence. The confirmation bias is powerful, pervasive and almost im- 
possible for any of us to avoid. It is why the methods of science that emphasize checking 
and rechecking, verification and replication, and especially attempts to falsify a claim, are 
so critical. 

6. Does the preponderance of evidence point to the claimant’s conclusion or to a different 

The theory of evolution, for example, is "proved" through a convergence of evidence 
from a number of independent lines of inquiry. No one fossil, no one piece of biological 
or paleontological evidence has "evolution" written on it; instead tens of thousands of 
evidentiary bits add up to a story of the evolution of life. Creationists conveniently ignore 
this confluence, focusing instead on trivial anomalies or currently unexplained phenom- 
ena in the history of life. 

7. Is the claimant employing the accepted rules of reason and tools of research, or have 
these been abandoned in favor of others that lead to the desired conclusion? 

A clear distinction can be made between SETI (Search for Extraterrestrial Intelligence) 
scientists and UFOlogists. SETI scientists begin with the null hypothesis that ETIs do not 
exist and that they must provide concrete evidence before making the extraordinary claim 
that we are not alone in the universe. UFOlogists begin with the positive hypothesis that 
ETIs exist and have visited us, then employ questionable research techniques to support 
that belief, such as hypnotic regression (revelations of abduction experiences), anecdo- 
tal reasoning (countless stories of UFO sightings), conspiratorial thinking (governmen- 
tal cover-ups of alien encounters), low-quality visual evidence (blurry photographs and 
grainy videos), and anomalistic thinking (atmospheric anomalies and visual mispercep- 
tions by eyewitnesses). 

8. Is the claimant providing an explanation for the observed phenomena or merely deny- 
ing the existing explanation? 

This is a classic debate strategy-criticize your opponent and never affirm what you believe 
to avoid criticism. It is next to impossible to get creationists to offer an explanation for 
life (other than "God did it"). Intelligent Design (ID) creationists have done no better, 
picking away at weaknesses in scientific explanations for difficult problems and offering 
in their stead. "ID did it." This stratagem is unacceptable in science. 

info @ sciences. ch 


3. Introduction 

EAME v3. 5-2013 

9. If the claimant proffers a new explanation, does it account for as many phenomena as 
the old explanation did? 

Many HIV/AIDS skeptics argue that lifestyle causes AIDS. Yet their alternative theory 
does not explain nearly as much of the data as the HIV theory does. To make their 
argument, they must ignore the diverse evidence in support of HIV as the causal vec- 
tor in AIDS while ignoring the significant correlation between the rise in AIDS among 
hemophiliacs shortly after HIV was inadvertently introduced into the blood supply. 

10. Do the claimant’s personal beliefs and biases drive the conclusions, or vice versa? 

All scientists hold social, political and ideological beliefs that could potentially slant their 
interpretations of the data (this is a "confirmation bias" also named "cherry picking" that 
is also by non-scientists the main cause of rejecting science results and tools), but how do 
those biases and beliefs affect their research in practice? Usually during the peer-review 
system, such biases and beliefs are rooted out, or the paper or book is rejected. 

By fine tuning we can go more far about reasoning fallacies. Here is a most exhaustive list: 

1. Ad hominem: An ad hominem argument attacks the messenger, not the message itself. 

2. Argument from authority: Argument that relies on the identity of an authority rather than 
the components of the argument itself. 

3. Argument from adverse consequences: Saying that because the implications of a state- 
ment being true would create negative results, it must not be true. 

4. Appeal to ignorance: If something is not known to be false, it must be true. 

5. Special pleading: Stating a universal principle, then insisting that it doesn’t apply to your 
assertions for some reason. 

6. Begging the question/ assuming the answer: This occurs when a statement has an un- 
proven premise. It is also named "circular reasoning" or "circular logic". 

7. Observational selection: Looking at only positive evidence while ignoring the negative 
and vice versa. 

8. Statistics of small numbers: Using small numbers in order to report large percentage 

9. Misunderstanding of the nature of statistics: Ignorance about central statistical assump- 
tions and the definition of metrics (the confusion of correlation and causation, the sample 
size and hate of maths bias are well known example). 

10. Post hoc, ergo propter hoc: Basing an effect on a cause only on the basis of chronology. 

11. Excluded middle, or false dichotomy: Portraying an issue or argument as having only two 
options and no spectrum in between. 

12. Short-term vs. long-term: Assuming a current trend has remained constant throughout its 
history and will continue to do so in the future, even though no evidence suggests such an 
extrapolation is justified. 


info @ sciences. eh 

EAME v3. 5-2013 

3. Introduction 

13. Slippery slope, related to excluded middle: Saying something is wrong because it is next 
to or loosely related to something wrong. 

14. Suppressed evidence and half-truths: Drawing an unwarranted conclusion from premises 
that are at least in part correct. 

15. Weasel words: The usage of vague, non-specific references. 

In addition to teaching us what to do when evaluating a claim to knowledge, any good baloney 
detection kit must also teach us what not to do. It helps us recognize the most common and per- 
ilous fallacies of logic and rhetoric. Many good examples can be found in religion and politics, 
because their practitioners are so often obliged to justify two contradictory propositions. 

Finally, we would like to quote Lavoisier: «The physicist may also, in the silence of his labo- 
ratory and his cabinet, perform patriotic functions; he can thanks to his works reduce the mass 
of evils which afflict happiness and, had he not, contributed by the new roads that he opened to 
himself, only to delay of a few years, of a few days, the average life of humans, he could also 
aspire to the glorious title of benefactor of humanity.» 

Section quality score: ☆☆☆☆☆ 

151 votes, 75.23% 

info @ sciences. ch 




Mathematics is the ultimate form of forced art. (unknown) 


1 Proof Theory 54 

1.1 Paradoxes 59 

1.2 Propositional Calculus 62 

1.3 Predicate Calculus 79 

1.4 Proofs 86 

2 Numbers 97 

2.1 Digital Bases 100 

2.2 Type of Numbers 103 

3 Arithmetic Operators 156 

3.1 Binary Relations 156 

3.2 Fundamental Arithmetic Laws 167 

3.3 Arithmetic Polynomials 183 

3.4 Absolute Value 184 

3.5 Calculation Rules (operators priorities) 187 

4 Number Theory 192 

4.1 Principle of good order 192 

4.2 Induction Principle 193 

4.3 Divisibility 195 

5 Set Theory 231 

5.1 Zermelo-Fraenkel Axiomatic 235 

5.2 Set Operations 245 

5.3 Functions and Applications 253 

6 Probabilities 269 

6. 1 Event Universe 270 


EAME v3. 5-2013 

4. Arithmetic 

6.2 Kolmogorov’s Axioms 271 

6.3 Conditional Probabilities 277 

6.4 Martingales 297 

6.5 Combinatorial Analysis 299 

6.6 Markov Chains 308 

7 Statistics 313 

7.1 Samples 315 

7.2 Averages 316 

7.3 Type of variables 339 

7.4 Fundamental postulate of statistics 366 

7.5 Diversity Index 367 

7.6 Distribution Functions (probabilities laws) 369 

7.7 Likelihood Estimators 476 

7.8 Finite Population Correction Factor 488 

7.9 Confidence Intervals 491 

7.10 Weak Law of Large Numbers 528 

7.11 Characteristic Function 532 

7.12 Central Limit Theorem 536 

7.13 Univariate Hypothesis and Adequation tests 542 

7.14 Robustness 654 

7.15 Multivariate Statistics 699 

7.16 Survival Statistics 742 

7.17 Propagation of Errors (experimental uncertainty analysis) 757 

7.18 A World without statistics 764 

info @ sciences. ch 


Proof Theory 

W E have chosed to begin the study of Applied Mathematics by the theory that 
seems to us the most fundamental and important in the field of pure and exact 
sciences: Proof Theory. The proof theory and of propositional calculus (logic) 
has three objectives through this book: 

1. Teach to the reader how to reason and demonstrate (prove), and this independently of the 
specialization field. 

2. Show that the process of a demonstration (proof) is independent of the language used. 

3. Prepare the reader to the Logic Theory (see section Logic Systems). 

4. Prepare the path to Godel’s incompleteness theorem (main goal of this section!). 

5. Prepare the reader to the Automata Theory (see section Automata Theory). 

Godel’s theorem is probably the most exciting point because if we define religion as a system of 
thought that contains unprovable statements, then it contains elements of faith, and Godel tells 
us that mathematics is not only a religion, but that then it is the only religion that can prove it is 


Rl. It is (very) strongly advised to read this section in parallel with those on Automata 
Theory and Logical Systems (including Boolean Algebra) available in Theoretical 
Computing chapter of this book. 

R2. We must approach Proof Theory as a sympathetic curiosity but which basically 
brings nothing much except working/reasoning methods. Moreover, its purpose is not 
to show that everything is demonstrable but that any proof can be done on a common 
language starting from a finite number of rules. 


Often when a student arrives in a graduate class he learned how to calculate or use algorithms 
but almost only a little or even not at all to reason. For all the reasoning the visual media is a 
powerful tool (a picture is worth a thousand words) and people who do not see that in tracing a 
given curve or straight line the solution appears or who do not see in space are really penalized. 

During high school we already manipulate unknown objects but especially to make calculations 
and when we reason about objects represented by letters, we can replace them visually by a real 
number, a vector, etc. At a given level we ask people to reason on more abstract structures and 
therefore to work on unknown objects which are elements of a set itself unknown, for example 
elements of any group (see section Set Theory). This visual support thus doesn’t exist anymore. 

EAME v3. 5-2013 

4. Arithmetic 

We ask so often to students to reason, to demonstrate the properties, but almost no one has ever 
taught them to reason properly, writing proofs, control proofs. If we ask a graduate student what 
is a proof, it most likely he will have some difficulty to answer. He can say that it is a text in 
which there are keywords like "therefore", "because", "if", "if and only if", "take a x such that", 
"assume", "lemma", "theorem", "let us look for a contradiction", etc. But he will probably be 
unable to provide the grammar of these texts nor their basics, and besides, its teachers, if they 
have not taken a course in Proof Theory, would probably be unable too. 

To understand this situation, remember that to speak a child does not need to know the grammar. 
He imitates his surroundings and it works very well: most of time a six year old child know to 
use complicated sentences without ever having done grammar. Most teachers also do not know 
the grammar of reasoning but, for them, the imitation process has work well and thus they 
reason correctly. The experience of the majority of university teachers shows that this process 
of imitation works well for very good students, and then it is enough, but it works much less, if 
not at all, for many others. 

As the complexity level is low (especially during an "equational" type reasoning), grammar 
is almost useless but when the level increase or when we do not understand why something 
is wrong, it becomes necessary to do some grammar to progress. Teachers and students are 
familiar with the following situation: in a school assignment the corrector barred whole page 
of a large red line and write "false" in the margin. When the student asks what is wrong, the 
corrector can only say things like "this has no relation with the requested proof", "nothing is 
right", ..., which help obviously not the student to understand. This is partly because the text 
written by the student uses the appropriate words but in a more or less random way and can not 
give meaning to the assembly of these words. In addition, the teacher does not have the tools to 
explain what is wrong. We must therefore give them to him! 

These tools exist but are fairly recent. The proof theory is a branch of mathematical logic whose 
origin is the crisis of the foundations: there was a doubt about what we had the "right" to do 
in a mathematical reasoning (see the "foundations crisis" further below). Paradoxes appeared, 
and it was then necessary to clarify the rules of proof and to verify that these rules are not 
contradictory. This theory appeared in the early 20th century, which is very new since most of 
the mathematics taught in the first half of the university is known since the 16th- 17th century. 

1.0.1 Foundations Crisis 

For the Greeks philosophers geometry was considered the highest form of knowledge, a pow- 
erful key to the metaphysical mysteries of the universe. It was rather a mystical belief and the 
link between mysticism and religion was made explicit in cults like those of the Pythagoreans. 
No culture has been deified a man for discovering a geometrical theorem! Later, mathematics 
was regarded as the model of a priori knowledge in the Aristotelian tradition of rationalism. 

No culture has since challenged a man for having discovered a geometrical theorem! Later, 
mathematics was regarded as the model of a priori knowledge in the Aristotelian tradition of 

The astonishment of the Greeks philosophers for mathematics has not left us, we find it in 
the traditional metaphor of mathematics as "Queen of Science". It was strengthened by the 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

spectacular success of mathematical models in science, success that the Greeks (even ignoring 
the simple algebra) had not anticipated. Since the discovery by Isaac Newton’s of integral 
calculus and the inverse square law of gravity in the late 1600s the phenomenal sciences and 
higher mathematics remained in close symbiosis - to the point that a predictive mathematical 
formalism was became the hallmark of a "hard science". 

After Newton, during the next two centuries, science aspired to that kind of rigour and purity 
that seemed inherent in mathematics. The metaphysical question seemed simple: mathemat- 
ics seemed to have a perfect a priori knowledge, and among all sciences, those that were able 
to mathematize most perfectly were the most effective for predicting phenomena. The per- 
fect knowledge therefore, was in a mathematical formalism that, once reached by science and 
embracing all aspects of reality, could found a posteriori empirical knowledge on an a priori 
rational logic. It was in this spirit that Marie Jean- Antoine Nicolas de Caritat, Marquis de 
Condorcet (French philosopher and mathematician), undertook to imagine describing the entire 
Universe as a set of partial differential equations being solved one after the other. 

The first break in this inspiring picture appeared in the second half of the 19th century, when 
Riemann and Lobachevsky separately proved that Euclid’s parallel axiom could be replaced 
by other geometries that produced "consistent" (we will come back more on this word further 
below). Riemannian geometry was modelled on a sphere, these of Lobatschewsky, on rotation 
of a hyperboloid. 

The impact of this discovery was later obscured by great upheaval, but at the time it made a 
thunderclap in the intellectual world. The existence of mutually inconsistent axiomatic systems, 
each of which could be a model for the phenomenal Universe, relied entirely into question the 
relation between mathematics and theoretical physics. 

When we knew only Euclid, there was only one possible geometry. One could believe that the 
Euclid’s axiom (see section of Euclidien Geometry) were a kind of knowledge a priori perfect 
on the geometry in the phenomenal world. But suddenly we had three geometries, embarrassing 
for metaphysical subtleties. 

Why would we choose between the axioms of plane geometry, spherical and hyperbolic geom- 
etry as real descriptions? Because all three are consistent, we can not choose any a priori as a 
foundation - the choice must be empirical, based on their predictive power in a given situation. 

Of course, the theoretical physicists have long been accustomed to choose a formalism to study 
a scientific problem. But it was already accepted widely, if not unconsciously, that the need to 
do so was based on human ignorance, and with logic or good enough mathematics, one could 
infer the right choice from principles first, and produce a priori descriptions of reality that had 
to be confirmed afterwards by empirical verification. 

However, Euclidean geometry, seen for hundreds of years as the model of axiomatic perfection 
of mathematics, had been dethroned. If we could not know a priori something as basic as the 
geometry in space, what hope was there for a pure rational theory that would encompass all of 
nature? Psychologically, Riemann and Lobachevsky had struck at the heart the mathematical 
enterprise as it had been designed before. 

Moreover, Riemann and Lobachevsky have pushed the nature of mathematical intuition into 
question. It was easy to believe implicitly that mathematical intuition was a form of perception 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

- a way to glimpse the Platonic world behind reality. But with two other geometries pushing the 
Euclid one in it’s limit, no one could never be sure to know what the world really looks like. 

Mathematicians responded to this dual problem with excessive rigour, trying to apply the ax- 
iomatic method in all mathematics. In the pre-axiomatic period, the proofs were often based on 
commonly accepted intuitions of the "reality" of mathematics, which could not automatically 
be regarded as valid. 

The new way of thinking about mathematics led to a series of spectacular success. Yet this 
had also a price. The axiomatic method made the connection between mathematics and the 
phenomenal reality increasingly close. Meanwhile, discoveries suggested that mathematical 
axioms that appeared to be consistent with phenomenal experience could lead to dizzying con- 
tradictions with this experience. 

Most mathematicians quickly became "formalist" arguing that pure mathematics could only 
be regarded as a kind of elaborate philosophy game that was played with symbols on paper 
(that’s the theory that is behind the mathematical prophetic qualification "zero content system" 
by Robert Heinlein). The "Platonic" belief in the reality of mathematical objects, in the old- 
fashioned way, seemed good for the trash, despite the fact that mathematicians still feel like 
platoniciens during the process of discovery of mathematics. 

Philosophically, then, the axiomatic method led most mathematicians to abandon previous be- 
liefs in the metaphysical specificity of mathematics. It also produced the contemporary rupture 
between pure and Applied Mathematics. Most of the great mathematicians of the early modern 
period - Newton, Leibniz, Fourier, Gauss and others - also occupied phenomenal science. The 
axiomatic method had hatched the modem idea of the pure mathematician as a great aesthete, 
heedless of physics. Ironically, formalism gave the pure mathematicians a bad addiction to 
the Platonic attitude. The researchers in Applied Mathematics ceased to meet physicists and 
learned to put themselves in their behind. 

This takes us to the early 20th century. For the beleaguered minority of Platonists, the worst 
was yet to come. Cantor, Frege, Russell and Whitehead showed that all pure mathematics could 
be built on the simple foundation of the Set Theory axiomatic. This suited well the formalists: 
the mathematics were reunifying, at least in principle, from a small set of rules detached of a 
big one. Platoniciens also were satisfied, if a great structure appeared, consistent keystone for 
the whole mathematics, the metaphysical specificity of mathematics could still be saved. 

In a negative way, though, a Platonist had the last word. Kurt Godel put his grain of sand in 
the program of axiomatization formalism when he proved that any sufficiently powerful axiom 
system to include integers numbers had to be either inconsistent (contain contradictions) or 
incomplete (too weak to decide the rightness or the falsity of some statements of the system). 
And that’s more or less where things stand today. Mathematicians know that many attempts to 
advance mathematics as a priori knowledge of the Universe must face numerous paradoxes and 
unable to decide which axiom system describes the real mathematics. They have been reduced 
to hope that standards axiomatizations are not inconsistent but just incomplete, and wondering 
anxiously what contradictions or unprovable theorems are waiting to be discovered elsewhere. 

However, on the front of empiricism, mathematics was always a spectacular success as a theo- 
retical construction tool. The great success of physics in the 20th century (General Relativity 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

and Quantum physics) pushed so far out of the realm of physical intuition, they could only be 
understood by meditating deeply on their mathematical formalism, and extending their logical 
conclusions, even when those findings seemed wildly bizarre. What irony! Just as the mathe- 
matical perception were to appear always less reliable in pure mathematics, it became more and 
more indispensable in phenomenal science. 

In contrast to this background, the applicability of mathematics to phenomenal science poses 
a more difficult problem than at first appears. The relation between the mathematical models 
and prediction of phenomena is complex, not only in practice but also in principle. Even more 
complex, as we now know, there are ways to axiomatize mathematics that mutually exclude 

But why is there only one good choice of mathematical model? That is, why is there a mathe- 
matical formalism, for example for quantum physics, so productive that it predicts the discovery 
of new observable particles? 

To answer this question we will can observe that, as well, works as a kind of definition. For 
many phenomenal systems, such exact predictive formalism has not been found, and none seem 
plausible. You can easily find such examples: climate or the behavior of a superior economy to 
that of a town - systems so chaotically interdependent that exact prediction is actually impossi- 
ble (not only in practice but in principle). 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

1.1 Paradoxes 

Since ancient times, some logicians had noticed the presence of many paradoxes within ratio- 
nality. In fact, we can say that despite their number, these paradoxes are merely illustrations 
of a few paradoxical structures. Let us look to for general culture to the most famous which 
constitute the class of "undecidable propositions". 


The paradox of the class of classes (Russell) 

There are two types of classes: those that contain themselves (or reflexive classes: 
the class of non-empty sets, the class of classes, ...) and those who do not contains 
themselves (or non-reflexives classes: the class of work to be returned, the class of blood 
oranges, ...). The question is the following: is the class of non-reflexives classes itself 
reflexive or non-reflexive? If it is reflexive, it contains itself and is thus in the class of 
non-reflexives classes that it represents, which is contradictory. If it is non-reflexive, it 
must be included in the class of non-reflexives classes and becomes ipso facto reflexive, 
we are facing again a contradiction. 

This Russell’s paradox is often known mainly under the two following variants: 

• Does the set of all sets that do not contain themselves contain himself? 

The answer is: If "Yes", then "No" and if "No" then "Yes"... 

• Those who do not shave themselves are shaved by the barber but not those who 
shave themselves. So who shaves the barber? 

The answer is: If the barber shave himself he enters in the category of peo- 
ple that shave themselves so he does not shave himself because he is the barber... 
But if he does not shave himself he enters in the category of people that are shaved 
by the barber... The answer is also undecidable... 

Russell’s paradox challenges the notion of a set as a collection defined by common ownership! 
In one shot it destroys the logic (undecidable proposition) and set theory... because the overall 
concept of all sets is an impossibility! ! ! The self-reference is the center of this logical problem! 

This paradox also returns to the question whether a math question correctly formulated (logical) 
necessarily admits an answer? Said in another way: is any mathematical statement provable... 
and it is Godel that many years after the statement of Russell’s paradox proved mathematically 
that the answer is No !!!!!! In other words, there will always be questions unanswered because 
any system (living language or mathematical tool) based itself is necessarily incomplete! This 
is the famous impact of Godel’s incompleteness theorem! 

Let us see another application of the Russell’s Paradox: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


In a library, there are two types of catalogues: Those who mention themselves 
and those who does not mention themselves. A librarian must draw up a catalogue of 
all catalogues that do not mention themselves. Having completed its work, our librarian 
asks whether or not to mention the catalogue that is precisely drafting. At this point, he 
is struck perplexity. If he does not mention this catalogue it will be a catalogue that is 
not mentioned and which should therefore be included in the list of catalogues that does 
not mention themselves. On the other hand, if he mentions the catalogue, this catalogue 
will become a catalogue that is mentioned and must therefore not be included in this 
catalogue, since it is the catalogue of catalogues which does not mention themselves. 

A variations of the previous paradox is the well-known liar paradox: 


Let us provisionally define lying as the work of making a false proposition. The 
Cretan poet Epimenides said: "All Cretans are liars", this is the proposal P. How to 
decide the truthfulness of PI If P is true, as Epimenides is Cretan, P must be false. P 
must therefore be false to be true, which is contradictory. 

As would have made understand the logician Ludwig Wittgenstein, these paradoxes ultimately 
show that mathematics is a pretty good tool to show the logic but not to talk about it. Give with 
mathematics an independent existence to this algebraic entities is madness and it is this that 
produces monsters like the set of all the sets... The logic is empty and can not tell the reality, it 
restrict to be just a picture of it. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

1.1.1 Hypothetical-Deductive Reasoning 

The hypothetical-deductive reasoning is, we know (see the Introduction of the book), the ability 
of the learner to deduce conclusions from pure hypotheses and not only of a real observation. 
It is a thought process that seeks to identify a causal explanation of any phenomenon (we will 
come back on this during our first steps in physics). The learner who uses this type of reasoning 
begins with a hypothesis and then tries to prove or disprove his hypothesis following the block 
diagram below: 

empirical changes in 

observations process hypothesis 

Figure 4.1 - Hypothetical-Deductive Reasoning block diagram 

The deductive procedure is to hold as true, provisionally, this first proposal that we name, in 
logic a "predicate" (see further below for more details) and to draw all the consequences logi- 
cally necessary, that is to say to look for its implications. 


Consider the proposal P : "X is a man", it implies the following proposition (): 
"X is mortal". 

The expression P =>■ Q (if it is a human it is necessarily mortal) is a predicative impli- 
cation (hence the term "predicate"). There is no case in this example where we can state 
P without Q. This example is that of strict implication, as we find in the "syllogism" 
(logical reasoning figure). 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


Experts have shown that the hypothetical-deductive reasoning develops gradually by chil- 
dren from six to seven years old and that this kind of reasoning is used systematically 
starting with a strict propositional function until the age of eleven-twelve. 


1.2 Propositional Calculus 

The "propositional calculus" (or "propositional logic") is an absolutely indispensable prelimi- 
nary to tackle a background in science, philosophy, law, politics, economics, etc. This type of 
calculation allows for decisions or testing procedures. These help to determine when a logical 
expression (proposition) is true and especially if it is always true. 

Definitions (#3): 

Dl. An expression that is always true whatever the content language of the variables that 
compose it is named a "valid expression", a "tautology" or a "law of propositional logic". 

D2. An expression that is always false is named a "contradiction" or "antilogy". 

D3. An expression that is sometimes true, sometimes false is called a "contingent expression". 

D4. We name "assertion" an expression that we can say unambiguously whether it is true or 

D5. The "object language" is the language used to write logical expressions. 

D6. The "meta-language" is the language used to talk about the object language in everyday 


Rl. There are expressions that are actually not assertions. For example, the statement 
"this statement is false" is a paradox that can be neither true nor false. 

R2. Consider a logical expression A. If it is a tautology, we frequently note it 
|= A and the A |= if it is a contradiction. 

R3. In mathematics we can try to prove in a general way that an assertion is true, 
but not that it is false (if this is the case we give just one example). 

V / 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

1.2.1 Propositions (premises) 

Definition (#4): In logic, a "proposition" is a statement that has meaning. That means we can 
say unambiguously whether this statement is true (T) or false (F). This is what we name the 
"Law of excluded middle". 


El. "I lie" is not a proposition (premise). If we assume that this statement is 
true, it is an affirmation of his own disability, so we should conclude that it is false. But 
if we assume that it is false, then the author of this statement does not lie, so he told the 
truth, thus the proposal would be true... 

E2. Another funny example is: 

• Everything has a creator 

• God is that creator 

• God does not have creator 

It’s a solution that fails since it violates its own premise... 

Definition (#5): A proposition in binary logic (where the proposals are either true or false) 
is therefore never true and false at the same time. This is what we call the "principle of non- 

Thus, a property on the set of propositions E is an application P from E to the set of "truth 
values True, False" {T, F}: 

P : E {T, F} (1.1) 

We speak about "associated subset" , when the proposition only generates a portion E' of E and 
vice versa. 


In E = N, if P(x) states "x is even", then P = (0,2,4, ..., 2 k,...} which is 
indeed only an associated subset of E but of same Cardinal (see section Sets Theory). 

In E = N, if the proposition P{x) is "x is even", then (0, 2, 4, ...,2k , ...} which is effectively 
an associated subset of E but with same Cardinal (section Set Theory). 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Definition (#6): Let P be a property of the set E. A property Q on E is a "negation" of P if 
and only if, for any x G E: 

• Q(x ) is F (false) if P(x) is T (true) 

• Q(x) is T (true) if P(x) is F (false) 

We can gather these conditions in a table called "truth table": 

P Q 





Table 4.1 - Truth table of values 

Table that we can also find or also write in the most explicit following form: 

P Q 





Table 4.2 - Truth table of explicit values 

or in binary form: 

P Q 





Table 4.3 - Truth table of binary values 

In other words, P and Q always have opposite truth values. We denote this kind of statement 
"Q is a negation of P": 

Q -v^ —iP 

where the symbol -i is the "negation connector". 

( 1 . 2 ) 


The expressions must be well-formed expressions (often abbreviated "WEE"). By defi- 
nition, any variable is a well-formed expression, thus ->P is a well-formed expression. If 
P, Q are well-formed formulas, then P Q is a well-formed expression (the expression 
"I am lying" is not well-formed because it contradicts itself). 

V / 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

1.2.2 Connectors 

There are other types of logical connectors: 

Definition (#7): Let P and Q two properties set defined on the same set E. P V Q (read "P OR 
Q") is a property on E defined by: 

• P V Q is true if at least P or Q are true 

• P V Q is false otherwise 

We can create the truth table of the "OR connector" or "disjunction connector" V: 
















Table 4.4 - OR truth table 

It should be easy to convince yourself that if the parts P, Q of E a are respectively associated 
with the properties P, Q thus PU Q (see section Set Theory) is associated to P V 0: 


Q^Q (1.3) 


The connector V is associative (no doubt about the fact that it is commutative!). For proof, just 
do a truth table where you can check that: 

[P V (Q V R)} = [(P V Q) V R] (1.4) 

Definition (#8): There is also the "AMD connector" or also named "conjunction connector" A 
for whatever are P, Q two properties defined on E, P A Q is a property defined on E by: 

• P A Q is true if both properties P, Q are true (the famous syllogism: All men are mortal, 
Socrates is a man, therefore Socrates is mortal is a famous example). 

• P A Q if false otherwise 

We can create the truth table of the "AMD connector" or "disjunction connector" V: 
















Table 4.5 - AND truth table 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

It should be also almost easy to convince yourself that if the parts P, Q of E a are respectively 
associated with the properties P, Q thus P D Q (see section Set Theory) is associated to P A Q. 


Q^Q (1.5) 


The connector A is associative (no doubt about the fact that it is commutative!). For proof, just 
do a truth table where you can check that: 

[P A (Q A i?)] = [(P A Q) A R] (1.6) 

The connectors V, A are distributive one on the other. Using a simple truth table, we can show 
that (ask me if you want a put the truth table): 

[P V (Q A i?)] = [{P V Q) A (P V R)] (1.7) 

as well as: 

[P A (Q V R)} = [(P A Q) V (P A R)} (1.8) 

Definition (#9): The "negation" operator -> transform a True value into a False value such that: 

-<T = F (1.9) 

-iF — T (1.10) 

So in logic, negation, also named "logical complement", is an operation that takes a proposition 
P to another proposition "not P", written ->P or sometimes P, which is interpreted intuitively 
as being True when P is false and False when P is True. Negation is thus a unary (single- 
argument) logical connective. 

As we will prove it in detail in the section of Fogic System (using a simple truth table) the "De 
Morgan’s laws" provide a way of distributing negation over disjunction and conjunction: 

n(PAQ)»[(nP)V(nQ)] (1.11) 

n(PVQ)»[(nP)A(nQ)] (1.12) 



To see the details of all logical operators, the reader should read the section of Fogical 
Systems (see chapter Theoretical Computing) where the identity, the double negative, 
the idempotence, associativity, the distributive properties, the De Morgan relations are 
presented more formally and with full details. 

V i w 

Fet us now come back on the "logical implication connector" sometimes also named just the 
"conditional" denoted by the symbol =x 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


In some books on propositional calculus, this connector is denoted by the symbol D and 
as part of the proof theory we often prefer the symbol — *. 

V / 

Let P, Q two properties given on E. P =>- Q is a property on E defined by: 

PI. P =>- Q is False if P is True and 0 is False. 

P2. P =>- 0 is True otherwise. 

In other words, the fact that P logically implies Q means that Q is True for any assessment for 
which P is True. The implication is therefore the famous "if... then ...". 

If we write the truth table of the implication (caution with the before last line! ! !): 
















Table 4.6 - Implication truth table 

In other terms, a False proposition implies that any conclusion will always be True. If the 
proposition is True the implication can be True only if the result is True. 


Consider the proposition: "If you get your diploma, I buy you a computer". 

Of all cases, only one corresponds to a broken promise: the one where the child 
graduates, and still has no computer (second line in the table above). 

What means exactly this promise, that we will write as following: 

You have your degree =>■ I buy you a computer"? 

Exactly this: 

• If you have get graduate, for sure, I will buy you a computer (I can not not buy it). 

• If you do not get graduate, I said nothing. 

The implication gives us that from any false proposition we can deduce any proposal (last two 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


In a course teached by Russell on the subject from a false proposition, any pro- 
posal can be inferred, a student asked him the following question (anecdote or legend?): 

• "Are you saying that from the proposition 2 + 2 = 5, it follows that you are the 

• "Yes", answered Russell. 

• "And could you prove it!", asked the student skeptical... 

• "Certainly", answered Russell, who immediately offered the following proof: 

1. Suppose that 2 + 2 = 5. 

2. Subtract 3 from each member of the equality, we thus get 1 = 2. 

3. By symmetry 2 = 1. 

4. The Pope and I are 2. Since 2 = 1, Pope and I are 1. It follow I’m the Pope. 

The implication connector is essential in mathematics, philosophy, etc. It is a backbone of any 
proof, evidence or deduction. It has the following useful properties (normally easy to check 
with a small truth table): 

P =+ Q •+> [(=Q) => (-’-P)] (1-14) 

P =» Q <+ [(=P) V Q) (1.15) 

And we have from the last property (again verifiable by a truth table): 

->(P + Q)+>[PA (=Q)] (1.16) 

The "logical equivalence connector" or "biconditional connector" denoted most of times by "<+" 
or sometimes by "+>•" meaning by definition: 

(P •+> Q) <+ [(P =+ <2) A (<2 =+ P)] (1.17) 

in other words, the first expression has the same value for all evaluation of the second. It is 
the same with the following relation that is more "atomic" as the logical equivalence is reduced 
only to the use of A, V and negation = (combination of what we have seen above): 

(P <+> Q) <+> [(P A Q) V (=P A =<3)] (1.18) 

When we prove such equivalence of two expressions we can therefore say that: "we prove that 
the equivalence is a tautology". 

The truth table of the equivalence is logically given by: 



P => Q 

Q => P 






















Table 4.7 - Truth table of the equivalence 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

P Q means (when its true!) that "P and 0 always have the same truth value" or "P and 0 
are equivalent." This is True if P and Q have the same value, False otherwise. 

Of course (it is a tautology): 

-(P ^ Q) =*► (P => -Q) (1.19) 

The relation P Q is equivalent to that P is a necessary and sufficient condition for 0 and 
that Q is a necessary and sufficient condition for P. 

The conclusion is that the conditions of the types: "necessary", "sufficient", "necessary and 
sufficient" can be reformulated with the terms "only if", "if", "if and only if". 


1. P =>■ Q reflects the fact that 0 is a necessary condition for P or in other words, P is True 
only if Q is True (in the truth table, when P => Q is equal to 1 we see that P is 1 only if 
Q is also 1). We also say that if P is true then Q is true. 

2. P <= Q or what is the same Q ^ P reflects the fact that Q is a sufficient condition for 
P or in other words, P is True if Q is True (in the truth table, when 0 =4- P takes the 
value 1 we see that P is 1 if Q is 1 too). 

3 . P ^ Q reflects the fact that 0 is a necessary and sufficient condition for P or in other 
words, P is True if and only if Q is True (in the truth table, when P Q takes the value 
1 we see that P is 1 if Q is 1 and if and only if Q is equal to 1). 


The expression "if and only if" therefore corresponds to a logical equivalence and can 
only be used to describe a bi-implication ! ! 

V . / 

The first stage of propositional calculus is the formalization of natural language statements. To 
make this work, the propositional calculus finally provides three types of tools: 

1. The "propositional variables" (P, Q, R , ...) symbolize any simple proposals. If the same 
variable occurs multiple times, each time it symbolizes the same proposal. 

2. The five logical operators: -i, A, V, <^>, =>. 

3. Punctuation are reduced to only opening and closing parentheses that organize reading so 
as to avoid ambiguity. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


Symbol Usage 

"Negation" is an operator that act only one one proposal; it is unary or 
monadic. "It is not raining" will be written ->P. This statement is true 
if and only if P is false (in this case if it is false that it is raining). The 
conventional use of negation is characterized by the double negation 
law: -i-i P is equivalent to P. 

— 1 


The "conjunction" or "logical product" is a binary operator; it connects 
two proposals. "Every man is mortal AND my car loses oil" is written 
P A Q. This latter expression is true if and only if P is true and Q is 



The "disconnection" or "logical sum" is also a binary operator. PVQ 
is true if and only if P is true OR Q is true or both are true. We can 
understand the OR into two ways: either inclusively or exclusively. In 
the first first case P V Q is true if P is true, if Q is true or if P and Q 
are both true. In the second case, P V Q is true if P is true or if Q is 
true but not if both are. The disjunction of propositional calculus is the 
inclusive OR and the give to the exclusive OR, that is to say the XOR, 
the name of "alternative". 



The "implication" is also a binary operator. It corresponds roughly to 
the linguistic pattern " If ... then ..." . "If I have time, I will go see movie " 
will be written P =>- Q. This latter relation is false if P is true and 
Q is false. If the result (here Q ) is true, the implication P Q is 

true. When the antecedent (here P ) is false, the implication is always 
true. This latter remark can be understood if one refers to statements of 
the type: "If we could put Paris in a bottle, the Eiffel Tower would be 
used as a plug" . In summary, an implication is false if and only if its 
antecedent is true and its consequent is false. 


The "bi-implication" or "equivalence" =>• is, too, a binary operator: it 
symbolizes the terms "... if and only if ..." and "... is equivalent to 
..." . The equivalence of two propositions is true if they have the same 
truth value. The bi-implication therefore expressed as a form of identity, 
which is why it is often used in definitions. 


Table 4.8 - Summary of logical core operators 

The reader should find sometimes by some authors that like to use at minimum the natural 
language in their books the symbol the sign that is sometimes placed before a logical con- 
sequence, such as the conclusion of a syllogism. The symbol consists of three dots placed in 
an upright triangle and is read therefore. We can also make use of the symbol "y" and is read 


y All men are mortal 
y Socrates is a man. 
Socrates is mortal 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

In this book we will avoid using this notation as the engineers don’t make use of this a lot. 

It is possible to establish equivalences between these operators. We have already seen how 
the biconditional could be defined as a product of reciprocal conditional, let us see now other 

(P O Q) O -i(P A -iQ) 
(P o Q) o (->P V Q) 
(P V Q) o (- 1 P => Q) 
(P A Q) o ->(P o -iQ) 


The classical operators A, V, AA can therefore be defined using the canonical operators 
-i, o through equivalence laws between them. 


A 1 so notice the two relations of De Morgan (see sectionof Boolean Algebra for the proof): 

-I (P V Q) O (-i P A -i Q) 
~>(P A Q) o (->P V -i Q) 

They allow to transform the disjunction into conjunction and vice versa: 

(P V Q) — 1 ( — P A —<Q) 
( P A Q) o - 1(- V —iQ) 


( 1 . 21 ) 


1.2.3 Decision procedures 

We have previously introduced the basic elements allowing us to operate on expressions from 
properties (propositional variables) without saying much about the handling of such expres- 
sions. So now you need to know that in propositional calculus there are two ways to establish 
that a proposition is a law of propositional logic. We can either: 

1. Use non-automated procedures 

2. Use axiomatic and demonstrative procedures 


In many books these procedures are presented before the structure of the propositional 
language. We chose here to do the opposite thinking that the approach would be easier. 


inf o@ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

1.2.3. 1 Non-axiomatic procedural decisions 

Several of these methods exist, but we will limit ourselves here to the simplest of them, 
that of the matrix calculation, often referred to as "methods of truth tables". 

The construction procedure is as we have already seen quite simple. Indeed, the truth value of a 
complex expression is a function of the truth value of the simple statements that compose it, and 
finally based on the truth value of propositional variables that makes it. Considering all possible 
combinations of truth values of propositional variables, we can determine the truth values of the 
complex expression. 

Truth tables, as we have seen it, give us the possibility to decide, about any proposition, if 
this latter is a tautology (always true), a contradiction (always false) or a contingent expression 
(sometimes true, sometimes false). 

We can distinguish at least four ways to combine propositional variables, brackets and connec- 

Name Description Example 


Malformed statement 

Nonsense. Neither true nor false 




Statement always true 

P V -iQ 



Statement always false 

P A ->P 


Contingent statetement 

Statement sometimes true, sometimes false 


Table 4.9 - Combination of propositional variables 

The method of truth tables helps to determine the type of expression that are well-formed to 
which we have to face. It requires, in principle, no invention, it is "only" a mechanical pro- 
cedure. Axiomatized procedures, however, are not entirely mechanical. Inventing a proof as 
part of an axiomatized system requires sometimes hability, practice or luck. Regarding to truth 
tables, here is the protocol to follow: 

When facing a well-formed expression, or function of truth, we first determine how many dis- 
tinct propositional variables we are dealing with. We then examine the various arguments that 
form this expression. We then construct a table with 2 n columns (n being the number of vari- 
ables and without forgotting that they are binary variables!) and a number of columns equal to 
the number of arguments plus columns for the expression itself and its other components (see 
previous examples). Then we assign to the variables the various combinations of True (1) and 
False (0) values that may be conferred upon them. Each row corresponds to a possible outcome 
and all of the rows is the set of all possible outcomes. There is, for example, a possible outcome 
wherein P is a true statement while Q is false. Axiomatic procedural decisions 

The axiomatization of a theory implies, besides its formalization, that we start form a 
finite number of axioms and that through the controlled transformation of these, we can get 
all the theorems of this theory. So we start from a few axioms whose truth is a statement (not 
proven). We determine afterwards deduction rules for manipulating the axioms or expression 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

obtained from these. The sequence of these deductions is a proof that leads to a theorem, a law 
or lemma. 

We will now briefly present two axiomatic systems, each consisting of axioms using two specific 
rules named "inference rules" (intuitive rules): 

1. The "modus ponens": If we prove A and A =>- B, then we can deduce B. A is named the 
"minor premise" and B the "major premise" of the modus ponens rule. 



x > y 



(x > y) => (y < x) 


we can deduce that: 

y < x 



Humans typically communicate in a way that resists shallow logical analysis. In a 
real conversation, people use words rather than terms, make utterances rather than 
sentences, and employ a wider variety of inference methods than modus ponens. 
A great deal of what is communicated and inferred in a conversation depends on 
context, the speakers and audience, their history, their shared knowledge and con- 
fidences, the feelers they lay out to establish mutual trust and rapport. 


2. The "substitution": we can in a schema of axioms replace a letter by any formula, at the 
condition that all identical letters are replaced by identical formulas ! 

Let us give as an example, two axiomatic systems: the axiomatic system of Whitehead 
and Russell, the axiomatic system of Lukasiewicz. 

(a) The axiomatic system of Russell and Whitehead adopts -i, V as primitive symbols 
and define =>•, A, <=> from these latter as follows (easily verifiable relations with truth 

(. A B ) <=>• —>A V B 

(A A B) -vv- — i ( — 1^4. V —<B) 

(A B) (A => B) => (B => A) 


This system includes 5 axioms, somewhat quite obvious plus two rules of inference. 
Axioms are given here using non-primitive symbols, as did Whitehead and Russell: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Al. (A V A) =► A 
A2. B => (AV B) 

A3. (4VB)^(BV A) 

A4. (AV(BV C)) =>{ BV (AV 67 )) 

A5. (5 => 67) => (A V 67) 

we have already presented above some of these. 

For example, to justify that ->A V A has a sense, we can proceed as following: 



Axiom A2 



(1) and substitution 


(- B 

=> C) => ((A VB)^(AV C)) =* (A V 67)) 

Axiom A5 (necessary) 


(B => 67) => ((“ 'A V B) => (— <A V 67)) 

(3) and substitution 


(5 => 67) => ((A =>B)=> (A^ 67)) 

(4) and property of =»■ 



V A) =* A) =► ((A => (A V A)) => (A => A)) 

(5) and substitution 


(A ^ (A V A)) => (A => A) 

(6) (modus ponens) 


(A => A) =► (A => A) 

(7) and axiom Al 


A =► A 

(8) and modus ponens 


-.A V A 

(9) and property of =>- 


The axiomatic system of Lukasiewicz includes three 

axioms, plus the two rules of 

inference (modus ponens and substitution): 

Al. (A =* B)((B =>C)^(A=> 67)) 

A2. A =>- (~ 'A =>• B) 

A3. (-.A ^ A) ^ A 

Here is the proof of the first two axioms in the system of Russell and Whitehead. 
These are the formulas (6) and (16) of the following derivation: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


(AV(BVC)) (B V (A VC)) 

Axiom A5 


(-.(£ =► C)V(-.(Av5)V(i4vC'))) 

=>- (— '(AV-B) V(— >(B =>)V(AvC))) 

(1) and substitution 


->(A VB)^ (( B => C) => (A V C)) 

A4 on (2) and modus ponens 


(A\I B) ^ ((B =>C)^(AV C)) 

(3) by the property of => 


(— > A V B ) =>- ((B =?■ C) (— <A V O)) 

(4) and substitution 


(A=>B)=> ((B =>C)=> (A=> C)) 

(5) and property of =>■ 


(B ^(AM RW ^ ((( Av B ) => (B V A))\ 
(S=>(AVS))=^ ^ (B ^(BvA)) ) 

(6) and substitution 


((A V B)^(BM A)) =>(B=>(BM A)) 

(7) modus ponens 


B=>(BM A) 

(8) modus ponens 


— i B (—if? V A) 

(9) and substitution 


“i—i B V (—if? V A ) 

(10) and property of =>• 


—i—i B V (— i-B V A ) (— >-B V (—i—i B V A)) 

A4 + substitution with (11) 


— i B V (—i—i B V A) 

(12) and modus ponens 


B (—i—i B V A) 

(19) and property of => 


B (— i B V A) 

(14) and property of =>■ 


A (— <A B ) 

(15) and substitution 

These axiomatizations let us found as theorems all tautologies or laws of the propositional logic. 
From everything that has been said so far, we can try to define what is a proof! ! ! ! 

Definition (#10): A finite sequence of formulas B \ , B 2l . . . . B m is name a "proof" from the 
assumptions/hypothesis Ai, A 2 , . . . , A n if for each i: 

• Bi is one of the assumptions/hypothesis A 1 , A 2 , . . . , A n 

• or Bi is a variant of an axiom 

• or Bi is inferred (by the application of the modus ponens rule) from the major premise 
Bj and minor premise Bj, where j,k<i 

• or Bi is inferred (by the application of the substitution rule) from an anterior premise Bj, 
the replaced variable not appearing in A 1 , A 2 , . . . , A n 

Such a sequence of formulas, B m being the final formula of the sequence, is more explicitly 
named "proof of B rn " from the assumptions/hypothesis (or axioms) Ai, A 2 , . . . , A n , what we 
also write: 

Ai, A 2 , . . . , A n h B m 


More explicitly a proof is a deductive argument for a mathematical statement. In the argument, 
other previously established statements, such as theorems, can be used. In principle, a proof 
can be traced back to self-evident or assumed statements, known as axioms. 

Proofs employ logic but usually include some amount of natural language which usually admits 
some ambiguity. In fact, the vast majority of proofs in written mathematics can be considered 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

as applications of rigorous informal logic. Purely formal proofs, written in symbolic language 
instead of natural language, are considered in proof theory. 


Note that when we try to prove a result from a number of assumptions, we do not try to 
prove the assumptions themselves ! 

V W 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

1.2.4 Quantifiers 

We have to complete the use of the connectors of propositional calculus by what we name 
"quantifiers" if we wish to solve some problems. Indeed, the propositional calculus does not 
allow us to state general things about the elements of a set, for example. In that sense, propo- 
sitional logic is only part of the reasoning. The calculus of predicates on the contrary allows 
to formally handle statements such as "there exists an x such that \x has an American car]" or 
"for all x [if x is a dachshund, then x is small]". In short, we extend the composed formulas 
in order to assert existential quantifiers ( "there...") and universal quantifiers ( "for every..."). 
The examples we just gave involve a bit special proposals like "x has an American car." This is 
proposition with a variable. These proposals are in fact the application of a function to x. This 
function, is this that associates "x has an American car" with x. We will denote this function 
by "... has an American car" and we say that is a propositional function because it is a function 
whose value is a proposal. Or a "predicate" as we already know. 

The existential and universal quantifiers go hand in hand with the use of propositional functions. 
The predicate calculus is however limited in the existential and universal formulas. Thus, we 
prohibit ourselves to use sentences l ik e "there is an affirmation of x such that ...". In fact, we 
allow ourselves to quantify only "individuals". This is why predicate logic is named "first-order 
logic" because it uses variables as basic mathematical objects (while in the second-order logic 
they can also be sets). 

First -Order Logic: 

Second - Order Logic: 

Before turning to the study of the predicate calculus we define: 

Dl. The "universal quantifier": V (for all) 

D2. The "existential quantifier": 3 (exists) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


If any complex number is the product of a non-negative number and a number of 
modulus 1 we will write: 

\/z G C, 3p G Z > 0, 3w G C : (|w| = 1 A z = pu) (1-28) 

The order of quantifiers is critical to meaning, as is illustrated by the following two propositions: 
For every natural number n, there exists a natural number s such that s = n 2 . 

This is clearly true! It just asserts that every natural number has a square. The meaning of the 
assertion in which the quantifiers are turned around is different: 

There exists a natural number s such that for every natural number n, s = n 2 . 

This is clearly false! It asserts that there is a single natural number s that is at the same time the 
square of every natural number. 

A frequent question in physics and mathematics is to know if the universal quantifier has to be 
before of after the predicates they refer to. In fact, strictly in terms of formal logic, quantifiers 
are always at the beginning of any formula. However, almost no one gives a proof that is written 
in the formal language. Even simple proofs would be very long and unreadable. But anyone, 
regardless of what natural language they speak, will interpret a sentence in the formal language 
in the same way. The price for this clarity of course is readability. Natural languages, because 
of their inherent ambiguity, are subject to many more limitations. 

Obviously the proper usage of a formal notation or of a more informal one depends particularly 
on the context of presentation. It is essential to whom we communicate an idea and this should 
guide us to use a suitable level of formal notation. 

We use the sometimes the symbol 3! to say briefly: "there is one and only one". 


A famous example is the way to explicit that the logarithm is a bijective function: 

\/x G M + , 3 \y G M+ : x = ln(y) (1.29) 

We will see now that the Proof Theory and Set Theory is the exact transcription of the principles 
and results of Logic (the one with a capitalized "L" ) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

1.3 Predicate Calculus 

In mathematics courses (algebra, analysis, geometry, ...), we prove the properties of different 
object types (integer, real, matrices, sequences, continuous functions, curves, triangles, ...) . To 
prove these properties, we need of course that the objects on which we work are clearly defined 
(what is a set, what is a real, what is point, ...?). 

In first-order logic and, in particular, in proof theory, the objects we study are the formulas and 
proofs. We must therefore give a precise definition of what are these objects. The terms and 
formulas are the grammar of a language, oversimplified and calculated exactly to say what we 
want without ambiguity and without unnecessary detour. 

1.3.1 Grammar 

Definitions (#11): 

Dl. The "terms" designate items for which we want to prove some properties (we will dis- 
cussed the latter much more in details further below): 

• In algebra, the terms refer to the elements of a set (group, ring, field, vector space, 
etc.). We also manipulate sets of objects (subgroups, subrings, subfields, etc.). The 
terms which will designate the objects are named "second-order terms". 

• In analysis, the terms refer most of time to real numbers (for example, if we put 
ourselves in functional spaces) or functions. 

D2. The "Formulas", are the properties of objects we study (we will discussed the latter also 
much more in details further below): 

• In algebra, we can write formulas to express that two elements commute, that a 
subspace is of dimension 3, etc. 

• In analysis, we will write formulas to express the continuity of a function, the con- 
vergence of a sequence, etc. 

• In set theory, formulas can express inclusion of two sets, membership of an element 
in a set, etc. 

D3. The "proof", enable to check if a formula is true. The precise meaning of this word will 
also need to be defined. More precisely, they are deductions under assumptions, they 
allow to "lead from truth to truth", the question of the truth of the conclusion is then 
returned to that of the hypothesis, which does not look at the logic but is based on the 
knowledge we have on things we talk about. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

1.3.2 Language 

In mathematics we use, depending on the area, different languages that are distinguished by the 
symbols used. The definition below simply expresses that it is sufficient to only have to give 
the list of symbols to specify the language. 

Definition (#12): A "language" is the content of a family (not necessarily finite) of symbols. 
We distinguish three kinds of languages: symbols, terms and formulas. 


Rl. We use sometimes the word "vocabulary" or "signature" instead of the word 

R2. We already know that the word "predicate" is used instead of the word "relation". 
We speak then of "predicate calculus" instead of "first-order logic"). 


1.3.2. 1 Symbols 

There are different types of symbols we will try to define: 

Dl. The "constant symbols" (see note below) 


The neutral element n in Set Theory (see section Set Theory) 

D2. The "function symbols" or "functors". To each function symbol is assigned a strictly 
positive integer that we name her "ary": this is the number of arguments of the function 
arguments. If the arity is 1 (resp. 2, . . . , n), we say then that the function is unary (resp 
binary., ..., n-ary). 


The binary functor of multiplicaton x or • in group theory (see section Set 

D3. The "relation symbol". Similarly to the previous definition, every relation symbol is 
associated with a positive or null integer (its arity) that corresponds to its number of 
arguments and we talk then of unary, binary, n-ary relation. 


The relation = is a binary relation (see section Set Theory) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

D4. The "individual variables". In what will follow we will give us an infinite set V of vari- 
ables. The variables will be recorded as it is traditional by the latin lowercase letters: x, 
y, z (possibly indexed: x 1: x 2 , x 3 ). 

D5. To this we should add the connectors V, A and quantifiers V, 3, 3! that we exten- 
sively discussed above, on which it is now useless to return. 


Rl. A constant symbol can be seen as a function symbol with 0 argument (arity zero). 

R2. We consider (unless otherwise stated) that each language contains the binary 
relation symbol = (read "equal") and the relation symbol with zero argument denoted _L 
(read "bottom" or "absurd") representing as we already know the value FALSE. In the 
description of a language, we will often omit to mention them. The symbol _L is often re- 
dundant. We can indeed, without using it, write a formula that is always false. However, 
it can represent the FALSE in a canonical way and therefore to write general proofs rules. 

R3. The role of functions and relations is very different. As we will see, the function 
symbols are used to construct the terms (of language objects) and the relation symbols to 
build formulas (the properties of these objects). Terms 

The terms, we also say the "first order terms", are the objects associated with the lan- 

Dl. Given C a language, the set T of terms on C is the smallest set containing the variables, 
the constants and stable (we do not go out of the set) by applying function symbols of C 
to the terms. 

D2. A "closed term" is a term that does not contain variables (by extension, only constants). 

D3. For a more formal definition, we can write: 

T 0 = {t} (1.30) 

where t is a variable or constant symbol and, for any k <E N: 

%+i = Tk U {/(£ i, . . . ,t n )\ti e 7^} (1-31) 

where / is obviously a function of arity n (let us recall that the arity is the number of 
function arguments). Thus, for each arity, there is a degree of set of terms. We have 

T = US 



info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

D5 . We name "height" of a term t the smallest k such that t G Tk- 

This definition means that variables and constants are terms and that if / is a n-ary func- 
tion symbol and p, . . . , t n are terms then f(t \, ... ,t n ) is also a term itself. The set T of 
terms is defined by the grammar: 

T={V\S c \S f (T,...,T} 



This expression must be read as follows: a element of the set T we are defining is either 
an element of V (variable) or an element of S c (the set of symbols of constant), or the 
application of a n - ary function symbol / G Sf (constants or variable) of T. 

Caution: The fact that / is of the good arity is only implicit in this notation. Moreover, 
writing Sf(T , . . . , T) does not mean that all function arguments are identical, but simply 
that these arguments are elements of T. 


It is often convenient to see a term (expression) as a tree, where each node is 
labeled with a function symbol (operator or function) and each sheet by a variable 
or constant. 

V V 

In what follows, we will almost always define concepts (or prove results) "by recurrence" on 
the structure or the size of a term. 

Definitions (#13): 

Dl. To prove a property P on the terms, it suffices to prove P for the variables and con- 
stants and to prove P(f(ti , . . . , t n )) from P(h ), . . . , P{t n ). We do then here a "proof by 
induction on the height of a term". It is a technique that we will find in the following 

Mathematical induction as an inference rule can be formalized as a second-order axiom. 
The axiom of induction is, in logical symbols: 

VP. [[P(0) A V(fc G N). [P(k) => P(k + 1)]] => V(n G N). P{n)] (1.34) 

In words, the basis P(0) and the inductive step (namely, that the inductive hypothesis 
P(k) implies P(k + 1)) together imply that Pin) for any natural number n. The axiom 
of induction asserts that the validity of inferring that P(n) holds for any natural number 
n from the basis and the inductive step. 

Induction can be compared to falling dominoes: whenever one domino falls, the next one 
also falls. The first step, proving that P(l) is true, starts the infinite chain reaction. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

D2. To define a function $ based on the terms, it is enough to define it on the variables and 
constants and tell how we get $(/(£i, . . . , t n )) from . . . , $(/„). We do here again 

a "definition by induction on the height of a term". 


The size (we also say the "length") of a term t (size denoted r(i)) is the number 
of function symbols occurring in t. formally: 

• t(x) = t(c) = 0 if x is a variable and c is a constant. 

* . . . , t n )) = 1 + E i<i<n r{ti ) 

where the 1 in the last relation represents the term / itself. Formulas 

Definition (#14): A "well-formed formula WEF", often simply "formula" is a word (i.e. 
a finite sequence of symbols from a given alphabet) that is part of a formal language. A formal 
language can be considered to be identical to the set containing all and only its formulas. 

The formulas of propositional calculus, also named "propositional formulas", are expressions 
such as (A A (B V C )). 

An atomic formula is a formula that contains no logical connectives nor quantifiers, or equiva- 
lently a formula that has no strict subformulas. The precise form of atomic formulas depends on 
the formal system under consideration; for propositional logic, for example, the atomic formu- 
las are the propositional variables. For predicate logic, the atoms are predicate symbols together 
with their arguments, each argument being a term. 

Symbols and 
strings of symbols 

Well-formed formulas 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Definition (#15): Formulas are built from "atomic formulas" using connectors and quantifiers. 
We will use the following connectors and quantifiers (which we already known): 

• Unary negation connector: -> 

• Binary connectors of conjunction and disjunction and implication: A, V, — » 

• Quantifiers: 3 which must be read "it exists" and V that must be read "for all" 

This notation of the connectors is almost (it should at least). It is used to avoid confusion 
between the formulas and the current language (metalanguage). 

Definitions (#16): 

Dl. Given £ a language, the "atomic formulas" of Fare the formulas of the form R(ti , . . . , t n 
where R is an n-ary relation symbol of £ of and ti , . . . , t n are terms of £. We denote by 
"Atom" all atomic formulas. If we denote by S r the set of relation symbols, we can write 
all terms related between them by the expression: 

Atom = S r (T, • • • , T) T e £, Vtj e T 


The set T of formulas of the first order logic of £ is thus defined by the grammar (where 
a; is a variable): 

T = Atom|F A F\F V F\F — » F\F^F\3xF\WxF 


where should be read: the set of formulas is the smallest set containing formulas and such 
that if F\ and F 2 are formulas then F\ V F 2 , etc. are formulas and can be related to each 

The reader must be careful to not to confuse terms and formulas, sin(x) is a term (func- 
tion), x = 3 is a formula. But sin(x) A (x = 3) is nothing: we can not, in fact, put a 
connector between a term and a formula (meaningless). 


Rl. To define a function $ based on formulas, we simply need to define <f> on 
atomic formulas. 

R2. To prove a property P on formulas, it suffices to prove P for the atomic 

R3. To prove a property P on the formulas, it is enough assume that the property 
holds for all formulas of size p < n and to prove the property for formulas of size n. 

V ) 

D2. A "sub-formula" of a formula (or expression) F is one of its components, verbatim a 
formula from which F is built. Formally, we define the set SF( F) of the sub-formulas F 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

• If F is atomic: 

SF(F) = {F} (1.37) 

• If F = Fi © F -2 (that is to say a composition) with © G {V, A, — »}: 

SF = (F) = {F} U SF(Fi) U SF(F 2 ) (1.38) 

• If F — ->F or Q 6 Ei with Q G {V, 3}: 

SF(F) = {F} U SF(Fi) (1.39) 

D3. A formula F of C uses only a finite number of symbols of C. This subset is named the 
"language of the formula" and is denoted by £(F). 

D4. The "size (or length) of a formula F", denoted by r(F) is the number of connectors or 
quantifiers occurring in F. Formally: 

• r(F) = 0 if F is an atomic formula 

• r(Fi © F 2 ) = 1 + t(F 1 ) + t(F 2 ) where once again ® G {V, A, -)•} 

• t(-iFx) = t(QxFi) = lr(Fi) with once again Q G {V, 3} 

D5. The "main operator" (we also say the "main connector") of a formula is defined as: 

• If A is atomic, so it has then it has no main operator 
•If A = -i B, the -i is the main operator of A 

• If A = B © C where once again © G {V, A, — )■} then © is the main operator of A 
•If A = Q xB where once again Q G (V, 3}, then Q is the main operator of A 

D6. Given F a formula. The set VL{F ) of free variables of F and the set V M (F) of dummies 
variables (or "linked variables") of F are defined by induction on r(F). 

An occurrence of a given variable is named "linked variable" or "dummy variable" in a 
formula F in a formula F, if in this formula a quantifier refers to it. Otherwise, we say 
we have a "free variable". 


An occurrence of a variable x in a formula F is a position of the variable in the 
formula F. Do not confuse with the object that is the variable itself! 

V I / 

To clarify the possible free variables of a formula F, we write F[x i, . . . , x n ] . This means 
that free variables of F are among the variables x\, . . . , x n , verbatim if y is free in F, 
then is one of the x % but the x % do not necessarily appear in F. 

We can define the dummy or free variables more formally: 

(a) If F = R(t\, . . . , t n ) is atomic then VL(F) is the set of free variables appearing in 
the ti and we have then for dummy variables VM (F) = 0. 

(b) If F = Fi © F 2 where © G {V, A, — )■} : VL(F) = VL(F 1 ) U VL(F 2 ) then 
VM(F) = VM (Fx) U VM(F 2 ). 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

(c) If F = ->Fi then VL(F) = VL(F i) and VM(F) = VM{F 1 ). 

(d) If F = Q xF 1 with Q e {V, 3} : VL(F) = VL (FJ - {x} and VM(F) = 
VM(Fi) U {x}. 


El. Given F: Vx (x ■ y — y ■ x) then VL(F) = {y} and VM(F) = {x} 

E2. Given G: (Vx3 y(x ■ z = z ■ y)} A {x = z ■ z} then VL(G) = {x, z} and 
VM{G) = {x,y}. 

D7. We say that formulas F and G are "a -equivalent" if they are (syntactically) identical only 
after the renaming of their related variables. 

D8. A "closed formula" is a formula without free variables. 

D9. Given F a formula, x a variable and t a term. F\x := t] is the formula obtained by 
replacing in F all free occurrences of x by t, after possible renaming of linked occurrences 
in F which are free in t. 


Rl. We will notice in the examples seen previously that a variable can have both free 
occurrences and linked occurrences. So we do not always have VL{F ) D VM(F ) = 0. 

R2. We can not rename y into x in the formula \/y (x ■ y — y ■ x) of the previous example 
and get the formula Vx (x • x = x • x): the variable x would be then "captured". We 
therefore can not rename variables without precautions: we must avoid capture of free 


\A Proofs 

The proofs that we found in mathematical books or theoretical physics books are assemblies 
of mathematical symbols and sentences containing keywords such as: "So", "because", "if", 
"if and only if", " it is necessary", "just", "take an x such that", "therefore", "assume", "seek a 
contradiction", etc. These words are assumed to be understood by all in the same way, which is 
in fact, not always the case. 

In any work, the purpose of a proof is to convince the reader of the truth of a statement by 
show him the intellectual path that gives him the possibility to control himself the truth and 
rigour of the statement. Depending on the level of the reader, this prove will be more or less 
detailed or formal: something that can be considered obvious in a graduate course may not be 
in a undergraduate level course. 

In a homework, the corrector know that the result given by the student is (normally) true and he 
knows the proof of it. The student must prove (correctly) the required result. The level of detail 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

that the student must give will depends sometimes on the confidence possessed by the corrector: 
in a good copy, a "proof by an evident recurrence" will be accepted, while a copy where there 
previously had an "obvious" which was ... obviously false, will not pass! 

To properly manage the level of details, we should know what is a complete proof. This work 
of formalization has been done at the beginning of the 20th century only!! 

Several things may seem surprising: 

1. There is only a finite number of rules: two for each of the connectors (and the equality) 
more three general rules. It was not at all clear before that a piori a finite number of 
rules we engough to prove all that is true. We will show this result (this is essentially the 
Completeness Theorem). The proof is not trivial at all. 

2. These are the same rules for all the mathematics and physics: algebra, analysis, geome- 
try, etc. This means that we have managed to isolate what is general in reasoning! ! ! We 
will see later that a proof is a assembly of pairs (T, A), where T is a set of formulas (the 
assumptions) and A a formula (the conclusion). When we do the arithmetic, geometry or 
real analysis, we use, in addition to rules, assumptions that are named as we know "ax- 
ioms". These express the particular properties of objects that we manipulate (for details 
on the the concept of axioms Introduction section of the book). 

We prove therefore, in general, formulas using a set of assumptions, and this set can vary during 
the proof: when we say "suppose F and let us prove G", F is then a new hypothesis that we can 
use to prove G. to formalize this, we introduce the concept of "sequent": 

Dl. A "sequent" is a pair ThF where: 

(a) Gamma is a finite set of formulas that represents the assumptions that we can use. 
This set is also named the "context of the sequent". 

(b) F is a formula. This is the formula we want to prove. We say that this formula is 
the "conclusion of the sequent". 

The sign "h" must be read "thesis" or "prove that". 

D2. A sequent TF is said to be "provable" (or demonstrable) if it can be obtained by a finite 
application of rules. A formula F is provable if the sequent b F is provable. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

1.4.1 Rules of Proofs 

Proofs rules are the bricks used to build demonstration steps. A formal demonstration is a finite 
(and correct!) assembly of rules. This assembly is not linear (not a suite) but a "tree." Indeed, 
we are often forced to make connections. 

We will present a choice of rules. We could have introduced other (instead of or in addition) 
that would give the same notion of provability. Those that have been chosen are "natural" and 
correspond to the arguments that we usually made in mathematics. In the common practice we 
use, in addition to the rules below, many other rules, but these can be deduced from previous 
ones. We name them "derived rules". 

It is traditional to write the root of the tree (the sequent conclusion) at the bottom, the leaves at 
the top: the nature is done as this... As it is also tradition to write on a sheet of paper, from up 
to down, it would not be unreasonable to write the root at the top and the leaves at the bottom . 
We must make a choice! 

A rule consists of: 

• a set of "premises" each is a sequent. There may be zero, one or more of them. 

• the conclusion sequent of the rule 

• a horizontal bar separating the premises (top) from the conclusion (bottom). On the right 
of the bar, we will indicate the name of the rule. 





This rule has two premises (T h A — » and TEA) and a conclusion (T£>) and is denoted 
in an abbreviated under the form —> e . It can be read in two ways: 

• from bottom to top: if we want to prove the conclusion, it suffices by using the rule 
to prove the premises. This is what we do when we seek a proof. This corresponds 
to the "analysis". 

• from top to bottom: if we proved the premises, so we also proved the conclusion. 
This is what we do when we write a proof. This corresponds to the "synthesis". 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

For the proofs there is a finite number of 17 rules in number that we will define below: 

1. Axiom: 

r,Ah a 



From bottom to top: if the conclusion of the sequent is one of the hypothesis, then the 
sequent is provable. 

2. Weakening: 

r h a 




• From top to bottom: if we prove A under the assumptions E, adding other hypothe- 
ses can still prove A. 

• From bottom to top: there are assumptions that may not serve 

3. Introduction of implication: 

r,Ah B 
rh b-aA 



From bottom to top: to prove that A B we assume A (that is to say, we add it to the 
assumptions) and we prove B. 

4. Elimination of implication: 

r h A B TA A 
r h B 


From bottom to top: to prove B, if we know a theorem of the form A -A B and if we can 
prove the lemma A — » B, it suffices to prove A. 

5. Introduction to the conjunction: 

rhd rhfi A 



From bottom to top: to prove A A B, it suffices to prove A and prove B. 

6. Elimination of the conjonction: 




h A AB 
h B 


From top to bottom: from A A B, we can deduce A (left elimination) and B (right elimi- 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

7. Introduction to the disjunction: 

TEA „ T \- B d 

V? or V d 

rhdvB * rhdvB 4 


From bottom to top: to prove A V B, it suffices to prove A (left disjunction) or prove B 
(right disjunction). 

8. Elimination of the disjunction: 

rh avb r,A\-c r,Bhc 
r f c 

V e 


From bottom to top: if we want to prove C and we know we have A F B, it is enough 
to prove in first time by assuming A, and in a second time by assuming B. This is a 
case-based reasoning. 

9. Introduction of the negation: 

T,A F T_ 

r i — iA ^ 


From bottom to top: to show ->A, we assume A and we prove the absurd (_L). 

10. Eliminiation of the negation: 

r f 

-<A T F A 



From top to bottom: if we proved ->A and A, then we proved by absurdity (_L). 
11. Classic absurdity (reductio ad absurdum): 




F A 


From bottom to top: to prove A, it suffices to prove the absurdity by assuming ->A. 

This rule is equivalent to say: A is true if and only if it is false that A is false. This rule is 
not obvious: it is necessary to prove some results (there are results we can not prove if we 
do not have this rule). Contrary to many others, this rule may also be applied at any time. 
We can, in fact, always say: To prove A, I suppose ->A and I will seek for a contradiction. 

12. Introduction of the universal quantifier: 

T F A x is not free in the formulas of T 

F MxA 



From bottom to top: to prove V.x A, it suffices to show A doing no assumption about x. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

13. Elimination of the universal quantifier: 

r b VxA w 

rb A[x := f] V< 


From top to bottom: from Vx A, we can deduce A[x : Of] for any term t. What we can 
also say under the form: if we proved A for all x, then we can use A with any object t 


14. Introduction of the existential quantifier: 

r b A[x := t] 
T b 3 xA 


From bottom to top: to prove 3x A, it suffices to found an object (verbatim a term f) for 
which we know how to prove A[x := t]. 

15. Elimination of the existential quantifier: 

T b 3 xA r, A b C x is not free in the formulas of T, neither in C . 

r b c : 


From bottom to top: we prove that there is indeed a set of assumptions such that 3x A 
and hence this result as new hypothesis, we prove C. This formula C inherits then from 
the formula 3x A and therefore x is not free in C because it already was not in 3.x A. 

16. Introduction of equality: 

r b t = t 


From bottom to top: we can always prove t = t. This rule means that equality is reflexive 
(see section Operators). 

17. Elimination of equality: 

T b A[x := t] T b f = u 
T b A[x := u 


From top to bottom: if we prove that T b A[x := t] and t = u, then we have prove 
A[x := u\. This rule expresses that equal objects have the same properties. We notice 
however that the formulas (or relations) t = u and u — t are not, formally, identical. We 
will have to prove that equality is symmetric (we will benefit also to prove on the way 
that equality is transitive). 

Let us see now three example by introducing them in the form of theorems as it should be in 
proof theory! 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Theorem 4.1. The equality is symmetric (a little bit not trivial but quite good to begin with the 

Proof 4.1.1. 

X\ = x 2 E Xi = X\ Xi = X 2 E X\ = x 2 


xi = x 2 E x 2 = x\ 


E Xi = x 2 -± x 2 = Xi w x 2 

h Vxi, x 2 (xi = x 2 ->■ X 2 = Xi) 


From top to bottom: we introduce the equality — t and prove from the assumption x,\ = x 2 the 
formula x\ = x | . At the same time, we define the axiom as what X\ = x 2 . Then from these 
premises, we eliminate the equality = e by substituting the terms so that from the assumption 
X \ = x 2 (from the axiom) we get x 2 = x \ . Then, the elimination of equality automatically 
implies without assumption that x\ — x 2 — > x 2 — x 1. Therefore, we simply insert the 
universal quantifier for each variable (ie twice) without any assumption to achieve that equality 
is symmetric. 

Theorem 4.2. The equality is transitive (that is to say if x\ 
By denoting F the formula (aq = x 2 ) A {x 2 = x 3 )\ 

□ Q.E.D. 

x 2 and x 2 = x 3 then x\ = £3). 

Proof 4.2.1. 

F h X\ = x 2 A x 2 = £3 
F h X\ — x 2 


A 9 

' x e 

F \- X \ = x 2 A £ 2 = £3 
F h £ 2 = £3 



F h £1 = £3 


h (£1 = £ 2 A £ 2 = £3) -A £1 = £3 

-V,- x 3 

V£i, £2, £ 3 ((^i = £2 A £ 2 = £3) "A £1 = £3) 

What do we do here? We first introduce the formula F twice as axiom to "dissect" it latter 
left and right (we do not introduce the equality supposed already introduced as a rule). Once 
done, we eliminate on the left and on the right the conjunction on the formula to work on left 
and right terms only and we introduce the equality of the two terms which makes that from the 
formula we have the transitive equality. It follows without any assumption that automatically 
implies that equality is transitive and finally we say that this is valid for any value of the different 
variables (if the formula is true, then equality is transitive). 

□ Q.E.D. 

And now last big example always in the form of a theorem: 

Theorem 4.3. Any involution is a bijection (see section Set Theory). 

Proof 4.3.1. Let / be a unary function symbol (with one variable), we write (for the details see 
the section of Set Theory): 

• Inj [/] the formula: 

V£, y : f{x) = f(y) -A £ = y (1.60) 

which means that / is injective. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

• Surj [/] the formula: 

Vy,3x: f{x) = y (1.61) 

• Bij [/] the formula: 

Inj [/] A Surj[/] (1.62) 

• Inv[/] the formula: 

Vx : f{f{x)) = x (1.63) 

which means thatf is an involution (we also writh this / o / = Id that this is to say that the 
composition of / is the identity). 

We would like to know if: 

Inv[/] h Bij [y] 


We will present (trying this to be done as easy as possible) this proof in four (!!!) different 
ways: traditional (informal), classic (pseudo-formal), formal in tree and formal in-line. 

• Traditional method (informal): 

We must prove that if f is involutive then it is bijective. So we have two things to prove 
(and both must be satisfied simultaneously): the function is injective and surjective. 

1. So we prove first that involution is injective. 

We assume for this, because / is an involution it therefore injective, such that: 

V.t, y : f{x) = f{y) (1.65) 

implies that: 

x = y (1.66) 

However, this assumption automatically comes from the definition of involution 

Vx : f(f(x)) = x,Wy : f(f(y)) = y 


and to the application of / to the relation: 

f{x) = f(y ) 


(thus three equalities so far) such that: 

/(/(*)) = f(f(y)) 


we therefore have: 

x = y 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

2. Let us prove that involution is surjective: if it is surjective, then we must have: 

Vy3x\ f(x) = y (1.71) 

But, let us define the variable x by definition of the involution itself: 

x:=f(y) (1-72) 

as y — f(x)..., and a change of variable after we get: 

f(f(y))=y (i-73) 

and therefore surjectivity is ensured. 

• Pseudo-formal method: 

We take again the same and we inject in it the rules of proof theory: 

We must show that / is involutive and therefore bijective. So we have two things to show 
(A*) (and both must be satisfied simultaneously): that the function injective and surjective: 

Inj[/] A Surj[/] (1.74) 

1 . Let us first prove that involution is injective. We assume for this, since / is involutive 
and therefore injective, that: 

(Vi) Vx,y : f(x) = f(y) (1.75) 


(— >*) x = y (1.76) 

However, this assumption automatically comes from the definition of involution 

(V e , ax) Vx f(f(x)) = x,Vy f(f(y)) = y (1.77) 

and from the application of / to the relation: 

f(x) = f(y) (1.78) 

(therefore three equalities — e x 2) such that: 

f(f(x)) = f(f(y)) (1.79) 

We therefore have: 

x = y (1.80) 

2. Let us prove that involution is surjective. If it is surjective, then we must have: 

Vy3x : f(x) = y (Vi) (1.81) 

Now, we define the variable x by definition of the involution itself: 

(3j) x = f(y) (1.82) 

since y — f(x)...., after a change of variables we get: 

f(f(y)) = y (1-83) 

and therefore: 

surjectivity is assured. 

(V e ,ax) 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Formal tree method: 

Let us do this now with the graphical method that we have presented above. 

1. Let us prove that involution is injective: 

For this we must prove first that: 

f(x) = f(y) f{f{x)) = f(f(y )) 



Inv[/] h Vy(f(f{y)) = y ) 


V P 

In v[/] I" {f{y) = x}[y := f{x)\ _ f(x) = f(y) h /(x) = f(y ) 

f(x) = f(y ) I- {f(x) = y}[x := f(y )] 


=e (a) 


In v[/] H Vx(f(f(x)) = x 



In v[/] l~ {f(x) = y}[x := f(y)\ f(x) = f(y) h f(x) = f (y) 
f(x) = f(y ) h {f(y) = x}[y := f(x)] 

That bring us to write: 



f(x) = f(y) h /(/(*)) = /(/(y)) 

=e (C) 

=e (b) 




The latter relation is abbreviated = c and named (as other existing) "derived 
rule" because it is an argument that is often made during proofs and a little 
time consuming to develop each time ... 


Inv[/] In v[/] 



Inv[/] ->■ Inv[/ 


In v[/] l~ /(/(g)) = x _ In v[/] h /(/(y)) = y 

= f(f(y )) I- x = y 

V R 

=e (C) 

Inv[/],/(x) = f(y) x = y 
In v[/] h /(or) = /(?/) ^x = y 
Inv[/] h Inv[/] 

2. Let us prove now that involution is surjective: 

= e (d) (1.89) 

— L; 



It follows: 

Inv[/1 h Vx{/(/(x)) = x} 

In v[/] l~ {/(x) = y}[x := /(y)] 
Inv[/] t~ 3x{f(x) = y} 
Inv[/] h Surj [/] 



-V, (e) 

Jra;[/] h Jnj[/] A Surj[f] 

A i 



info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

• Formal in-line method: 

We can do the same thing in a slightly less... wide form ... and more ... tabbed (it is no 
less indigestible): 

Inv[/] b Bij [/] V i (1.92a) 

1 : = Inv [/ ] b Inj [/ ] 


(1) Inv[/] b Inj[/] Vi 

In v[/] b f(x) = f(y ) -> x = y ->* 

hv[/],/(a;) = /(j/) b x = y 

(i) Inv[/] b /(/(x)) = x V e 

Inv[/] b Inv[/] ax 

(**) Inv [/] I" /(/(?/)) =1/ V e 

Inv[/] b Inv[/] ax 

(in) f(x) = f(y) b /(/(x)) = /(/(?/)) 

(l')/0b> = /(//) •" /(» = /(?/) ax 

(2) Inv[/] b Surj[/] Vi 

Inv[/] b 3 x{/(x) = y} 3j 

Inv[/] b {/(x) = y}[x := f(y )] V e 

Inv[/] b Vx{/(/(x)) = x} ax 




□ Q.E.D. 

64 votes, 80.94% 

Version: 3.1 Revision 5 I Last update: 2015-09-06 16:33 


info @ sciences. ch 


T He basis of mathematics, apart the reasoning (see section Proof Theory), is undoubt- 
edly to ordinary people: arithmetic. It is therefore mandatory that we make a stop 
on it to study its origin, some of its properties and consequences. 

The numbers, like geometric figures are the basis of Geometry, are the basis of Arithmetic. 
These are also the historical basis because mathematics probably started with the study of these 
objects, but also the educational foundation, because it is by learning to count that we enter in 
the world of mathematics. 

The history of numbers, also sometimes called "scalar" is far too long to be told here, but we 
can only advise you one of the best book on the subject: The Universal History of Numbers 
( 2,000 pages in three volumes) Georges Ifrah, ISBN: 2221057791. 

But here’s a little flange of this latter which seems fundamental to us: 

Our current decimal system, on base 10, uses the digits 0 to 9, called "Arabic numbers", but the 
fact of Indian origin (Hindus). The first numbers seems to have been created in the third century 
BC in India by Brahmagupta, an Indian mathematician, he created the figures in Devanagari. 

Indeed, the Arabic numbers (of Indian origin...) in the table below are the first line and we see 
that they are significantly different from the "Indian numbers" of the second line: 




















Figure 4.2 - Indo-Arabic numbers 

You have to read this table as following from left to right: 0 "zero", 1 "one", 2 "two", 3 "three", 
4 "four", 5 "five", 6 "six", 7 "seven", 8 "eight, 9 "nine". This system is much more efficient than 
the Roman numerals (try doing a calculation with Roman notation system you will see...). 

It is commonly accepted that these numbers were introduced in Europe only about the year 
1,000. Used in India, they were transmitted by Arabs to the Western world by the Pope Gerbert 
of Aurillac during his stay in Andalusia at the end of the 9th century. 

4. Arithmetic 

EAME v3. 5-2013 


The French word "chiffre" (number) is a corruption of the Arabic word "sifr" meaning 
"zero". In Italian, "zero" is "zero", and seems to be a contraction of "zefiro", we again see 
here an Arabic root but the "zero" could also be of Indian origin... So the words "chiffre" 
and "zero" have the same origin. 

V / 

The early use of a numerical symbol for the "nothing" in the sense of "no amount", i.e. our 
"zero" is because the Indians used a system called "positional system". In such a system, the 
position of a digit in the writing of a number expresses the power of 10 and the number of times 
it occurs ... and the absence of a position in this system arise from huge proofreading problems 
and could lead to large errors in calculations. The revolutionary and simple introduction of the 
concept of "nothing" allowed a proofreading without error of numbers. 

The absence of a power is denoted by a small circle...: the zero. Our current system is thus the 
"decimal and positional system". 


ICf I0 1 10° 

i 1 1 

324 = 3x100 + 2x10 + 4x1 

1 I 



hundreds — — - 

Figure 4.3 - Description of decimal and positional system 

The number 324 is written from left to right as three hundred: 3 times 100, two tens: 2 
times 10 and four units: 4 times 1. 

Thus a "decimal number" is thus a number that has a finite writing in base 10. 

We sometimes see (and this is recommended) a thousands separator represented by a coma in 
United States (put all three numbers from the first from the right for the whole numbers). Thus, 
we write 1,034 instead of 1034 or 1,344,567,569 instead of 1344567569. Thousand separators 
permits to quickly quantify the magnitude of the read numbers. 


• If we see only one coma we know that the number is about thousands 

• If we see two apostrophes we know that the number is about millions 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

• If we see three apostrophes we know that the number is about billions 

• etc. 

and so on... also with decimals this gives: 

Figure 4.4 - Scale representation of the positional system 

In fact, any integer other than the unit can be taken as the basis of a numbering system. We have 
for example the binary, ternary, quaternary, ..., decimal, duodecimal numbering systems which 
correspond respectively to the bases two, three, four, ..., ten, twelve. 

A generalization of what has been seen above, can be written as follows: 

Any positive integer N can be represented in a base b as a sum, where each coefficient a, are 
multiplied by their respective weight b\ Such as: 

A — a n -ib n 1 + 2 & n 2 + ... + ci±b^ + cio^° 

( 2 . 1 ) 

More elegantly written: 

n— 1 

N = J2 


with di E [0, b — 1] and bi E [1, b n x ] 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

2.1 Digital Bases 

To write a number in base 6 system, we must first adopt b characters for represent- 
ing the b first numbers for example in the decimal system: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. 
These characters are as we already defined them, the "digits" that we pronounce as usual 
{zero, one, two, three, four, five, six, seven, eigth, nine}. 

For the written numbers, we make this convention that a digit, placed to the left of another 
represents the order units immediately above, or b times larger. To take the place of units that 
may be lacking in certain orders, we use the zero "0" and consequently, the number of digits 
may vary. 

Definition (#17): For the spoken numbers, we agree to name "single unit", "ten", "hun- 
dred", "thousand", etc., units of the first, second, third, fourth order, etc. Thus the num- 
bers 10, 11, ..., 19 will be readen in the same way in all numbering systems. The numbers 
la, 16, aO, 60, ... will be readen ten-a, ten-6, a-ten, 6-ten, etc. Thus, the number 5b6a71c will be 

five million be-hundred sixty-a thousand seven hundred ten-c 

This small example is relevant because it shows the general expression of the spoken language 
we use daily is intuitively in base ten (fault of our education). 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


Rl. The rules of mathematical operations defined for numbers written in the decimal 
system are the same for numbers written in any numbering system. 

R2. To quickly operate in any numbering system, it is useful to know by heart all sums 
and products of two numbers of a single digit. 

R3. The decimal seems has its origin in the fact that humans being have ten fingers. 

V 1 / 

Let’s see how we convert a numbering system in another one: 


El. In base ten we have seen above that 142, 713 will be written as: 

142, 713i 0 = 1 • 10 5 + 4 ■ 10 4 + 2 • 10 3 + 7 • 10 2 + 1 • 10 1 + 3 • 10° (2.4) 

E2. The number 0110 that is in base two (binary base) would be written in base 10: 

0110 2 = 0 • 2 3 + 1 • 2 2 + 1 • 2 1 + 0 ■ 2° = 6i 0 (2.5) 

and so on... The reverse operation is often a little trickier (for example the case of the binary 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

2.2 Type of Numbers 

Now that we know that number is a mathematical object used to count, measure and label it 
must be know that it exists in mathematics a wide variety of numbers (natural, rational, real, 
irrational, complex, p-adic quaternions, transfinite, algebraic, constructibles, etc.) since any 
mathematician may at leisure create its own numbers just by defining axioms (rules) for the 
manipulating them (see section Set Theory). 

However, there are a few of them that we find much more often than others through this book 
and some that serve as basic construction for others and which that should be defined sufficiently 
rigorously (without going to the extremes) in order to know what we will talk about when we 
will use them. 

2.2.1 Natural Integer Numbers 

The idea of "integer" (the numbers for which there are no decimals) is the fundamental concept 
of mathematics and comes at the view of a group of objects of the same types (a sheep, another 
sheep, yet another sheep, etc.). 

When the amount of objects in a group is different from that of another group when the speak 
about a group that is numerically higher or lower regardless of the type of objects in these 
groups. When the amount of objects of one or multiples groups is equivalent, then we speak 
about "equality". 

To each single object the number "one" or "unit" denoted by "1" in the decimal system will be 

To form groups of objects, we can operate as follows: to an object, add another object, then 
another, and so on... each of of the clusters, from the point of view of its community, is charac- 
terized by a number. It follows from this that a number can be regarded as representing a group 
of units (single items) such that each unit corresponds to one single object of the collection. 

Definition (#18): Two numbers are said to be "equal" if each of the units of one we can match a 
unique unit of the other and vice versa (in a bijective way as seen in the section of Set Theory). 
If this does not hold true when we talk about "inequality". 

Let us take an object, then another, then to the formed group add again an object and so on. The 
groups thus formed are characterized by numbers which, taken in the same order as the groups 
successively obtained, are the "natural sequence N", also sometimes named "whole numbers", 
and denoted by: 

¥={ 0 , 1 , 2 , 3 ,...} 


To be unambiguous about whether 0 is included or not, sometimes an index (or superscript) is 
added in the former case: 

N* = N 1 = {1,2,3,...} 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


The presence of the 0 (zero) in our definition of N is debatable since it is neither positive 
nor negative. That is why in some books you will find a definition of N without the 0. 

V / 

The components of this natural set can be defined by (we own this definition to the mathemati- 
cian Frege Gottlob) the following the properties (having read first the section on Set Theory is 
strongly recommended...): 

PI. 0 (read "zero") is the number of elements (defined as an equivalence relation) of all sets 
equivalent to (in bijection with) the empty set. 

P2. 1 (read "one") is the number of elements of all sets equivalent to the set whose only 
element is 1 . 

P3. 2 (read "two") is the number of elements of all sets equivalent to the set whose only 
element are 1 and 2. 

P4. In general, an integer is the number of elements of all sets equivalent to the set of integers 
preceding it! 

The construction of the set of natural numbers is made of the most natural and consistent man- 
ner. Natural numbers get their name from what they were, in the beginnings of their existence, 
to count quantities and things of nature or intervened in human life. The originality of this set 
lies in the empirical way he has been built since it does not actually the result of a mathematical 
definition, but more by awareness of the human by the concept of countable quantity, of number 
and operations that reflect the relations between them. 

The question about the origin of N is therefore the question of the origin of mathematics. And 
since thousands of years debates confronting the thoughts of the greatest philosophical minds 
have attempted to elucidate this deep mystery as to whether mathematics is a pure creation of 
the human mind or whether the man has only rediscovered a science that already existed in 
nature. Besides the many philosophical questions that the set of Natural numbers can generate, 
it is nonetheless interesting from an exclusively mathematical point of view. Because of its 
structure, it has remarkable properties that can be very useful when we practice some given 
reasoning or calculations. 

The sequence of natural numbers is unlimited (see section Theory Of Numbers) but countable 
(we will this property in details below), because in a group of objects that is represented by a 
number n, it will be enough to add an object to get another group that will be defined by the 
integer n + 1. 

Definition (#19): Two integers that differs from a single positive unit are said to be "consecu- 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

2.2. 1.1 Peano axioms 

During the crisis of foundations of mathematics, mathematicians have obviously sought 
to axiomatize the set N and we own the actual axiomatisation to Peano and Dedekind. 

The axioms of this system include the symbols < and = to represent the relations "smaller than" 
and "equal to" (see section Operators). They include also the symbols "0" for the number zero 
and s to represent the "successor" number. In this system, 1 is denoted by: 

1 = s(0) (2.10) 

named "successor to zero" and 2 is denoted by: 

2 = s(s(0)) = s(l) (2.11) 

The Peano axioms that builds N are (see section of Proof Theory for details on some of the 
symbols use below): 

A1 . 0 is a natural number (this permits Y- to be not empty). 

A2. Every natural number n has a successor, denoted by s(n). Therefore s is an injective 
application (see section Set Theory), that is to say: 

Vx,y s(x) = s(y) x = y (2.12) 

That is to say that if two successors are equal, they are the successors of the same number. 

A3. The successor of a natural number is never zero (therefore N has a first element): 

Vx -i(s(x) = 0) (2.13) 

A4. If we prove a property (p that is true for x and its successor s(x), then this property is true 
for any x (axiom of recurrence"): 

(<p(x) =>- <p(s(x))) =>• Vx (p(x) (2.14) 

So the set of all the numbers satisfying the four above axioms is denoted by: 

So the set of all the numbers satisfying the four above axioms is denoted by: 

N = {0,l,2,3,..,n,...} 



The Peano axioms allow to build very rigorously the two basic operations of arithmetic 
in N that are addition and multiplication (see section on Operators) and so all the other 
sets that we will see later (subtraction in N can not be applied because it can give negative 

\ / 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 Odd, Even and Perfect Numbers 

In arithmetic, study the parity of an integer, its determiner if this integer is or not a 
multiple of 2. An integer multiple of 2 is an even integer, the others are odd integers. 

Definitions (#20): 

Dl. The numbers obtained by counting by step of 2 from zero (i.e.. 0, 2, 4, 6, 8, ...) in the set 
of natural integer numbers N are named "even numbers". 

The n th even number is obviously given by the relation: 

E: 2 n = n + n VN (2.16) 

D2. The numbers we get by counting by step of 2 starting from 1 (i.e.. 1, 3, 5, 7, ...) in the set 
of natural integer numbers N are named "odd numbers". 

The (n + l) th even number is almost obviously given by the relation: 

O: 2n + 1 = E VN (2.17) 


We name "perfect numbers", numbers equal to the sum of their integer divisors strictly 
smaller than themselves (concept we will see in detail later) such as:6 = l + 2 + 3 and 
28 + 1 = 2 + 4 + 7 + 14. 

V / Prime Numbers 

Definition (#21): A "prime number" is an integer with exactly two positive divisors 
(these divisors are both: "1" and the number itself). In the case where there are more than two 
dividers it is named a "composite number". The property of being prime (or not) is named 

The study of prime numbers is a huge subject in mathematics (see for a small example the 
section of Number Theory or of Cryptography). There are books of thousands of pages on the 
subject and probable hundreds of research article per month even nowadays. Most theorems are 
largely out of the study of the site book (and out of the interest of its main author...)! 

Here is the set of prime numbers less than 1000: 

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 
103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 

199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 

313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 

433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 

563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 

673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 

79, 83, 89, 97, 101, 
181, 191, 193, 197, 
409,419, 421,431, 
523, 541, 547, 557, 
647,653,659, 661, 
773, 787, 797, 809, 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 
941, 947, 953, 967, 971, 977, 983, 991, 997 

The whole set of prime numbers is sometimes denoted by P. 


Note that the primes numbers set does not include the number "1" because it has a only a 
single divider (himself) and not two as is the definition. 

V / 

We can ask ourselves if there are infinitely many prime numbers? The answer is YES and here 
is a proof (among others) by contradiction. 

Proof 4.3.2. Suppose that there is a finite number of prime numbers that would be denoted by: 

Pl,P2,-,Pn (2-18) 

We create a new number from the product of this prime number to which we add "1": 

N = {PlP2-~Pn) + 1 (2.19) 

According to our initial hypothesis and the fundamental theorem of arithmetic (see section 
Number Theory) the new number N should be divisible by one of the existing prime p % such 
that we can write: 

N = q- Pi (2.20) 

where q is an integer. We can make the division: 

jpiP2-Pn) + 1 jg PlP2-Pn | 1 ^ ^ 

Pi Pi Pi 

The first term is simplified as p t is in the product. Let us note the resulting integer E: 

q = E + - (2.22) 


But, q and E are integers, so 1 fp r should be an integer. But p, is by definition greater than 1. 
So 1 1 Pi is not an integer and so is also q. 

Then there is contradiction, and we can conclude that the prime numbers are not finite but are 

□ Q.E.D. 


Rl. The product p n = p\P 2 ---Pn of the indexed prime numbers < n is named the "n-th 

R2. We send the reader to the section Cryptography of the chapter on Theoretical Com- 
puting (or Number Theory section of the chapter Arithmetic) for the study of some re- 
markable properties of prime numbers including the famous Euler 0 function (also named 
"indicator function") and a 20th-21th century industrial application of prime numbers. 

V / 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

2.2.2 Relative Integer Numbers 

The set of natural integers N has a few issues that we did not set out earlier. For example, sub- 
tracting two numbers into N does not always have a result in N (negative numbers not existing 
in this set). Other issue... dividing two numbers in N also does not always have a result in N 
(fractional numbers - rational or irrational - not existing in this set). We then say in the language 
of set theory that: the substraction and division is not an internal operation of N. 

We can first resolve the problem of subtraction by adding to the set of natural numbers N, 
negative integers (revolutionary concept for those who where behind this concept at their time!) 
to get the set of "relative integers" denoted by Z (for "Zahl" from German, meaning "Number"): 

Z = 

-3, -2, -1,0, 1,2,3,... 


The set of natural integers is therefore included in the set of relative integers. This is what we 
denote by (see section Set Theory): 

and we have by definition (it is a notations to be learned! ! !): 

Z + = Z>o = {n e Z | n > 0} = N* 

Zq = Z> 0 = {n e Z | n > 0} = N 
Zq = Z <0 = {nG Z | n < 0} 


Z* = Z^o = {n G Z | n ^ 0} 

originally created to make the natural numbers an object that we name a "group" 

spt Thpnrvt rp.1ntivp.1v tn 1 hp 'vldition. 

This set was originally created to make the nature 
(see section Set Theory) relatively to the addition. 

Definition (#22): We say that a set A is a "countable set", if it is equipotent to N. That is to 
if there is a bijection (see section Set Theory) of S on N. Thus, roughly said, two equipotent s 
have the same number of elements in the meaning of their cardinal (see section Set Theory) 
at least the same infinity. 

The purpose of this concept is to understand that the sets N and Z are countable. 

Proof 4.3.3. Let us show that Z is countable by writing: 

x 2 k = k and x 2 k+i = 

for any integer k > 0. This gives the following ordered list: 

0,-1, 1,-2, 2, -3, 3,... 

of all relative integers from natural integers only! 

= -k- 1 

( 2 .: 

□ Q.E.D. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

2.2.3 Rational Numbers 

The set of relative integers Z also still has an issue. Dividing two numbers in Z also does not 
always have a result in Z (fraction numbers - rationnal or irrational - not existing in this set). 
We then say in the language of set theory that: the division is not an internal operation of Z. 

We can thus define a new set that contains all the numbers which can be written as a "fraction" 
that is to say the ratio of a dividend (numerator) and a divider (denominator). When a number 
can be written in this form, we say that it is a "fractional number": 

/ MnmArcitnr 

( 2 . 28 ) 

\ Denominator 

A fraction can be used to express a part or fraction of something (of an object, of a distance, of 
a land, of an amount of money, of a cake...). 

To better understand rational number (fractions) let us consider two individuals: Andy and 
Bobby that bot love pizza. On Monday night, they share a pizza equally. How much of the 
pizza does each one get? Are you thinking that each boy gets half of the pizza? That’s right. 
There is one whole pizza, evenly divided into two parts, so each boy gets one of the two equal 

parts. In math, we write - to mean one out of two parts: 

Figure 4.6 - A 1/2 fraction example (source: OpenStax) 

On Tuesday, Andy and Bobby share a pizza with their parents, Fred and Christy, with each 
person getting an equal amount of the whole pizza. How much of the pizza does each person 
get? There is one whole pizza, divided evenly into four equal parts. Each person has one of the 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Figure 4.7 - A 1/4 fraction example (source: OpenStax) 

On Wednesday, the family invites some friends over for a pizza dinner. There are a total of 12 
people. If they share the pizza equally, each person would get — of the pizza: 

Figure 4.8 - A 1/12 fraction example (source: OpenStax) 

By definition, the "set of rational numbers" is given by: 

Q = j^l (p,q) e %,q ± oj 


In other words, any rational number is any number that can be expressed as the quotient or 
fraction p/q of two integers, p and q, with the denominator q not equal to zero. Since q may be 
equal to 1, every integer is a rational number. 

We also assume as obvious that: 

NcZcQ (2.30) 

The logic of the creation of the set of rational numbers Q is similar to that of relative integers 
Z. Indeed, mathematicians wanted to make of the set relative numbers Z a "group" with respect 
to the law of multiplication and division (see section Set Theory). 

Moreover, contrary to the intuition of most people, the set of natural integers N and rational 
numbers Q are equipotent. We can convince ourselves of this equipotence by ranking ,as Cantor 
did, rational numbers in a first time as follows: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


> 2/1 3/1 

— > 4/1 5/1 — »• 6/1 7/1 

s s s s' 


2/2 3/2 

4/2 5/2 6/2 7/2 


s s s s' 


2/3 3/3 

4/3 5/3 6/3 7/3 

Z / ' 

s s s s' 


2/4 3/4 

4/4 5/4 6/4 7/4 


^ ' 

s s s s 


2/5 3/5 

4/5 5/5 6/5 7/5 

X / ' 

s s s s' 


2/6 3/6 

4/6 5/6 6/6 7/6 

s s s s' 


2/7 3/7 

4/7 5/7 6/7 7/7 

Figure 4.9 - Cantor diagonal method 

This table is constructed so that each rational number appears only once (in the sense of its 
decimal value) by diagonal hence the name of the method: "Cantor diagonal". 

If we eliminate from each diagonal the rational numbers that appear more than one time (the 
"equivalent fractions") in order to keep only those who are irreducible (i.e. those with the 
greatest common divisor of the numerator and denominator is equal to 1), then we can with this 
distinction define an application / : N =>• Q that is injective (two distinct rational numbers have 
distinct ranks) and surjective (at any place will be written a rational number). 

The application / is therefore bijective: N and Q are then effectively equipotent! 

The definition a little bit more rigorous (and therefore less funny) of Q from Z is as follows (it 
is interesting to see the notation used): 

On the set Z x Z {0}, which should be read as the set constructed from two relative integer 
whose zero is excluded from the second one, we consider the relation R between two relative 
pairs of integers defined by: 

(a, b)R(a', b ' ) ab' = ab (2.31) 

We then easily verify that R is an equivalence relation (see section Operators) onZ x Z \ {0}. 

The set of equivalence classes for this relation R denoted then (ZxZ \ {()}) / R is by definition 
Q. That is to say that we write therefore more rigorously: 

Q = (ZxZ\{0 })/R (2.32) 

The equivalence class (a, b) e Z x Z \ {0} is explicitly denoted by: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

in accordance with the notation that everyone is accustomed to use. 

We easily check the addition and multiplication operations that were operations defined on Z 
pass without problems to Q by writing: 

a p 
b q 

a ■ p 
b ■ q 


a p 

T + - 

b q 

aq + bp 


Moreover, these operations provide Q with the structure of a body (see section Set Theory) with 
- as neutral element for the addition and - as neutral element for the multiplication. Thus, any 
non- zero element of Q is reversible, in fact: 

a b ab 1 
b a ab 1 

what is written also more technically: 

(ab, ab)R( 1, 1) 




Even if we want to define Q as the being the set Z x Z \ 0 where Z represents the 
numerators and Z \ 0 the denominators of the rationals, this is not possible because 
otherwise we would for example (1, 2) ^ (2, 4) while we expect for an equality. 

Hence the need to introduce an equivalence relation which enables us to identify, to return 
to the previous example, with (1, 2) and (2, 4). The relation R that we have defined does 
not fall from heaven, indeed the reader who handled the rational so far without ever 
having seen their formal definition knows that: 

- = — <S=> ab' = a'b (2.37) 

b b' 

It is therefore almost natural to define the relation R as we have done. In particular, 

1 2 

regarding the above example, - = - because (1, 2)R(2, 4) and the problem is solved. 

\ J 

In addition to the historical circumstances of its establishment, this new entity (set) is distin- 
guished from relative numbers because it induces the original and paradoxical concept of par- 
tial quantities. This notion that a priori does not make sense, find its place in the mind of man 
thanks to the geometry where the idea of fraction of length, of proportion are illustrated more 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

2.2.4 Irrational Numbers 

It can seem obvious to present irrational numbers before real number (see further below) but 
this can be explained by the fact of this is the order of the discovering in the human history and 
therefore is seems more pedagogical to us to present them in this order. 

So the set of rational Q is limited and sadly not sufficient too. Indeed, we may think that all 
mathematical computation with commonly known operations are reduced to this set but it is not 
the case! 


El. Let us calculate the square root of two which we denote y/2 (thing to Pythagorean 
theorem with a triangle of side 1 and 1 then the third one is of size y/2). Suppose it is a 
rational root. So if this is truly a rational, we should be able to express it as a/b, where 
by the definition of a rational a and b are integers with no common factors. For this 
reason, a and b can not both be even numbers. There are three remaining possibilities: 

1 . a is odd (then b is even) 

2. a is even (then b is odd) 

3. a is odd (then b is odd) 

By squaring, we have: 

V2=j (2.38) 


That can be written: 

2 b 2 = a 2 (2.39) 

Since the square of an odd number is odd and the square of an even number is even, the 
case (1) is not possible because a 2 would be odd and 2 b 2 would be even. 

If case (2) is also impossible, because then we could write a = 2c, where c is any integer, 
and so if we take the square then we have a 2 = 4c 2 that is to say an even number on both 
sides of equality. Substituting in 2 b 2 = a 2 we obtain after simplification that b 2 = 2c 2 . 
Then b 2 would be odd while 2c 2 would even. 

The case (3) is also impossible because a 2 is then odd and 2 b 2 is even (that b is even or 

There is the no solutions! That is to say that the start assumption is false and there does 
not two integers a and b such that \/2 = a/b. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

E2. Let us prove by contradiction, that the famous Euler number e is irrational. To do 
this, remember that e (see section Functional Analysis) can also be defined by the Taylor 
series (see section Sequences and Series): 

e = 1 + i + i + i + + i + (2.40) 

Then if e is rational, it could be written in the form p/q (with q > 1, because we know 
that is not an integer). Let us multiply both sides of the equality by q\: 

i g! g! g! 

q!e — q! + — + — + — + 

g! g! 

+ g! + (g + 1)! 


(9 + 2 )! 

+ . . . 


The first member g!e would then be an integer, because by definition of the factorial: 

g! = g ■ (g- 1) • (g - 2) . . .2 • 1 


is an integer. 

The first terms of the second member of the previous prior-previous relation, until the 
term q\/q\ = 1 are also integer because g!/m! is simplified if g > m. So by subtraction 
we find: 

g!e — 

, , T- . q'- , T- . . 

9! + H + 2! + 3! + '" + g ! 

g! g! 

(g + 1)! + (g + 2)! + "' 


when the right sequences should be an integer! 

After simplification, the second member of the equality becomes: 

1 1 

g + 1 + (g + l)(g + 2) + “' 


the first term in this sum is strictly less than 1/2, the second strictly less than 1/4 second, 
the third strictly less than 1/8, etc. 

So, since each term is strictly less than the following harmonic series which converges to 




= 1 


then therefore the sequence is not an integer as being strictly less than 1. This is a con- 

Thus, the rational numbers do not satisfy the numerical expression of a/2 and e (to cite only 
these two particular examples). 

They must therefore be complemented by the set of all numbers that can not be written as a 
fraction (the ratio of an integer dividend and an integer divisor without common factors) and 
that we name "irrational numbers". Finally we can say that: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Definition (#23): In mathematics, an "irrational number" is any real number that cannot be 
expressed as a ratio of integers. Irrational numbers cannot be represented as terminating or 
repeating decimals. 

2.2.5 Real Numbers 

Definition (#24): The union of rational and irrational numbers gives the set of "real numbers" 
that we denote by: 



Mathematicians in their usual rigour have different techniques to define real numbers. 
They use the properties of topology (among others) and especially Cauchy sequences but 
that’s another story that goes beyond the formal scope of this section. For a "set point of 
view definition of R the reader should report to the section on Set Theory. 

Figure 4.10 - Simple number sets summary 

Obviously we are led to ask ourselves whether R is countable or not. The proof is quite simple. 

Proof 4.3.4. By definition, we have seen above that there must be a bijective correspondence 
between Q and R to that R is countable. 

For simplicity, we will show that the interval [0, 1 [ is then not countable. This will involve of 
course by extension that R is not countable! 

The elements of this interval are represented by infinite sequences between 0 and 9 (in the 
decimal system): 

• Some of these suites are zero from starting from a given rank, some not. 

• So we can identify [0, 1[ to the set of all sequences (finite or infinite) of integers between 
0 and 9. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

If this set was countable, we could classify these sequences (with a first, second, etc.). 
Thus, the sequence xuXi 2 XisXu...xi p ... would be classified first and so on ... as proposed 
in the above table. 

We could then edit this infinite matrix as follows: to each element of the diagonal, we add 
1, according to the rule: 0 + 1 = 1,1 + 1 = 2, 8 + 1 = 9 and 9 + 1 = 0: 

Then let us consider the sequence on the diagonal: 

- It cannot be equal to the first sequence of the first row in the prior-previous table 
since it is distinguished at least by the first element. 

- It cannot be equal to the second sequence of the second row of the prior-previous 
table since is distinguished at least by the second element. 

- It cannot be equal to the third sequence of the second row of the prior-previous table 
since is distinguished at least by the third element. 

and so on ... It the cannot be equal to any of the sequences in this table! 

So whatever the chosen classification of infinite sequences of 0...9, there is always one who 
escapes this classification! So it is that it is impossible to number them ... simply because they 
do not form a countable set! 

□ Q.E.D. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

The technique that has allowed us to achieve this result is known as the "Cantor diagonal pro- 
cess" (because similar to that used for equipotence between the natural and rational set) and the 
set of real numbers is said to have the "power of continuum" by the fact that it is uncountable. 


We assume that it is intuitive for the reader intuitive that any real number can be approx- 
imated infinitely close by a rational number (for irrational numbers we simply stop at 
a given number of decimals and find the corresponding rational). Mathematicians say 
therefore that Q is "dense" in M. and denote this by: 

Q = M (2.47) 

\ / 

In business it is of usage with real numbers to communicate in percentages or per-thousand. 

Definitions (#25): 

D1 . Given a scalar x e M then expressed in percentage it will denoted by: 

x% = x- 100 (2.48) 

D2. Given a scalar i£l then expressed in per-thousand it will denoted by: 

x%c = x ■ 1, 000 (2.49) 

2.2.6 Transfinite Numbers 

We now are with an infinity of real numbers which is different from that of natural numbers. 
Cantor then dared what no one had dared since Aristotle: the positive integers sequence is also 
infinite, the set N, is then a set that has a countable infinity of elements, then he said that the 
cardinal (see section Set Theory) of this set was a number that existed as such without we use 
the tote symbol oo, he denote it: 

K 0 = Card(N) (2.50) 

This symbol is as we know (see section Set Theory) the first letter of the Hebrew alphabet, 
pronounced "aleph zero". Cantor was going to name this strange number, a "transfinite number". 

The decisive act is to assert that there is, after the finite, a transfinite, that is to say an unlimited 
scale of determined modes which by nature are infinite, and yet can be specified, as for the 
finite, by specific numbers, well defined and distinguishable from each other!! This tool was 
necessary as a set cardinal can be equal to one of its parts as we will see just below! 

After this first stroke going against most ideas for over two thousand years, Cantor would con- 
tinue its path and build the calculation rules, paradoxical at first glance, of the transfinite num- 
bers. These rules were based, as we said earlier, on the fact that two infinite sets are equivalent 
if there exists a bijection between the two sets. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Thus, we can easily show that the infinity of even numbers is equivalent to the infinity of inte- 
gers: for this, it suffices to show that for every integer, we can associate an even number, his 
double, and vice versa. Therefore the cardinal of integers is equal to those of even numbers (the 
cardinal of a set can be equal to one of its parts!). 

Thus, although if even numbers are included in the set of integers, there is an infinity a 0 of 
them, the two sets are equipotent. By stating that a set can be equal to one of its parts, Cantor 
goes against what seemed obvious to Aristotle and Euclid: the set of all sets is infinite! This 
will shake the whole of mathematics and will bring the axiomatic Zermelo-Fraenkel we will see 
in the section of Set Theory. 

From the above, Cantor define the following calculations rules on the Cardinals: 

N 0 + 1 = N 0 N 0 + N 0 = No N 2 = No 



At first glance these rules seem non-intuitive, but in fact they are! Indeed, Cantor defined the 
addition of two transfinite numbers as cardinal of the disjoint union of the corresponding sets. 


El. By noting K 0 the cardinal of N we have K 0 + N 0 which is equivalent to say- 
ing that we summ the cardinal of N disjoint union N. But as N disjoint union N is 
equipotent to N then K 0 + N 0 = N 0 (it is enough to be convinced to take the set of odd 
and even integers which are both countable and which disjoint union is also countable). 

E2. Other trivial example: K 0 + 1 corresponds to the cardinality of N union a point. This 
set is still equipotent to N therefore N 0 + 1 = N 0 . 

We will also during our study of the section Set Theory that the concept of Cartesian product of 
two countable sets is such that we have: 

Card(N x N) = Card(N 2 ) = [Card(N)] 2 (2.52) 

and therefore: 

No = No (2.53) 

Similarly (see section Set Theory) since Z = Z + U Z“ we have: 

N 0 + N 0 = N 0 (2.54) 

and identifying Q to Z x Z (ratio of a numerator over denominator) we have immediately: 

N 0 x K 0 = (N 0 ) 2 = K 0 (2.55) 

We can also prove an interesting statement: if we consider the cardinality of the set of all the 
cardinals, it is necessarily greater than all the cardinals, including itself (it is better to have read 
previously the section of Set Theory)! In other words: the cardinality of the set of all sets of A 
is greater than the cardinal of A itself. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

This implies that there is no set containing all sets since there is always a bigger one (it is an 
equivalent form of the famous old Cantor’s paradox)!! ! 

In technical language it means considering a non-empty set A and then to state that: 

Card(A) < Card (V(A)) (2.56) 

where V(A) is the set of subsets of A (see the section Set Theory for the general calculation of 
the cardinal of the set of all parts of a countable set). 

That is to say, by definition of the order relation < (strictly less), it suffices to prove that there is 
no surjective application f : A (->• V(A), in other words that to each element of the set of parts 
of A it does not match at least one pre-image in A. 


The set V(N) for example consists of the set of even numbers, odd numbers, natural 
numbers, as well as the empty set itself, etc. V(N) is therefore the set of all "potatoes" 

(to borrow the vocabulary of high school ...) that make N. 

V I / 

Proof 4.3.5. Suppose that we can number each potatoe of V(A) with at least one element of 
A (imagine that with N or see the example in the section of Set Theory). In other words it is 
equivalent to suppose that f : A i— >• V(A) is surjective and let us consider a subset E of A such 

E = {x e A\x £ f(x)} (2.57) 

that is to say the set of elements x oi A that do not belong to the set numbered by x (the element 
x does not belong to the "potato" that it numbers in other terms...). 

Or, if / is surjective it must also be a y e A for this subset E such that: 

f(y) = E = {x e A\x i f{x)} (2.58) 

since E is also a subset of A. 

Suppose that y belongs to E. In this case, by definition of E, y ^ f(y) = E (by definition 
of E that applies for every x and x can also be obviously y or z or don’t matter what). By 
consequence, y / nE , but in this second case, always by definition of E, y e f(y) — E (as y is 
not in E). We see therefore that the element y cannot exists and therefore / cannot be surjective. 

We strongly recommend the reader to read the previous sentence more than on time if necessary. 

□ Q.E.D. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

2.2.7 Complex Numbers 

Invented in the 16th century among others by Girolamo Cardano and Rafaello Bombelli, "com- 
plex numbers" (also named "imaginary numbers") are used to solve problems with no solutions 
in M. and also used to mathematically formali z e certain transformations in the plan such as ro- 
tation, similarity, translation, etc. and also to generalized some theorem restricted to M and 
therefore hiding some interesting results for practical engineering. For physicists, complex 
numbers above are also a very convenient way to simplify notations. It is thus very difficult 
to study wave phenomena, General Relativity or quantum mechanics without using complex 
numbers and expressions. 

There are several ways to construct complex numbers. The first is typical of the construction 
way that mathematicians used as part of Set Theory. They define a couple of real numbers 
and define the operations between these couples to finally arrive at a meaning of the complex 
number concept. The second one is less rigorous but its approach is simpler and consist to 
define the pure unit imaginary number i and then build arithmetic operations from its definition. 
We will opt for the second method in the texts that will follow ! 

Definitions (#26): 

1 . We define the "unit pure imaginary number" that we denote by i by the following property: 



= —1 -vv- i = \J — 1 

2. A "complex number" is a pair of a real number a and an imaginary number i b and gener- 
ally written in the following form: 

z = a + ib 


where a and b are numbers belonging to M. 

3. We note the set of complex numbers by C and therefore we have by construction: 

KcC (2.61) 


The set C is identified to the oriented Euclidean plane E (see section Vector Calculus) 
thanks to the choice of a direct orthonormal basis (we therefore get "Argand-Cauchy 
plane", also named "Gauss-Argand plane" or more commonly "Gauss plane" that we will 
see a little further below and that seems have be defined for the first time in 1806). 

V / 

The set of complex numbers that constitutes a field (see section Set Theory) and denoted by C, 
is defined (in a simple way to start) in the notation of set theory by: 

C = {z = (x + i y) \x, i/6K} 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

In other words we say that the field C is the field M. to which we have added the imaginary 
number i. Which is formally denoted by: 

M [i] (2.63) 

The addition and multiplication of complex numbers are internal operations to the set (field) 
of complex numbers (we will come back much more in detail on certain properties of complex 
numbers in the section of Set Theory) and defined by: 

zi + z 2 = {xi + x 2 ) + i(yi + y 2 ) 
z i ■ z 2 = {xrx 2 - 2 / 12 / 2 ) + i(xiy 2 + x 2 y ] ) 

The "real part" of z is traditionally denoted by: 

9ft(z) = x 

The "imaginary part" of z is traditionally denoted by: 

^(z) = x 

The "conjugate" of z is defined by: 

z = x — i y 

and is sometimes also denoted z* (particularly in quantum physics in some books!). 





From a complex and its conjugate, it is possible to find its real and imaginary parts. These are 
the following obvious relations: 

, -2 + 2 . -A z^jz 

9R(z) = — - — and S(z) = 


2 v ' 2i 

The "module" of z (or "norm") is the length from the center of the Gaussian plane (see further 
below a figure of the Gaussian plane) and is simply calculated using the Pythagorean theorem : 

\z\ = \/ x 2 + y 2 = V ' z ■ z 


and is always a positive number or or equal to zero. 

We consider as obvious that is satisfy all the properties of a distance (see section of Topology 
and Vector Calculus). 

r Remark ^ 

The notation 

of z when z i: 


\z\ for the module is not innocent since 

3 real. 


coincides with the absolute value 


The division between two complex number s is calculated as (the denominator is obviously not 

zi _ xi + m _ xi + iyi _ x 2 - i y 2 _ (x x x 2 + 2 /U/ 2 ) - ifoiZfe ~ x 2 yi) 
z 2 x 2 + iy 2 x 2 + i y 2 x 2 - i y 2 x\ + y\ 

The opposite of a complex number is calculated similarly: 

1 x — i y x — i y x y 

x + i y {x + iy) (x — iy) x 2 + y 2 x 2 + y 2 x 2 + y 2 

We can therefore list 8 important properties of the module and the complex conjugate: 



info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

PI. We affirm that: 

\z\ = 0 2 = 0 (2.72) 

Proof 4.3.6. By definition of the module \z\ = a/x 2 + y 2 so that the sum x 2 + y 2 is zero, 
the necessary condition is that as (x, y ) G M: 

x = y = 0 (2.73) 

□ Q.E.D. 

P2. We affirm that: 

\z\ = \- z \ = \z\ (2.74) 

Proof 4.3.7. This is immediate by: 

|z| = \jx 2 + y 2 = \/(-x ) 2 + ( -y ) 2 = \- z\ = y/{x ) 2 + ( -y ) 2 = \z\ (2.75) 

□ Q.E.D. 

P3. We affirm that: 

|9?(z) | ^ \z\ with equality iif z is real 

|9 : (^)| ^ \z\ with equality iif z is imaginary 

Proof 4.3.8. The two above inequalities can be written: 

\x\ < \J x 2 T y 2 
\y\ < \Jx 2 + y 2 

thus equivalent respectively to: 

x 2 < \J x 2 T y 2 
y 2 < \Jx 2 + y 2 

which are trivial. The rest of the proof is therefore trivial! 




□ Q.E.D. 

P4. We have: 

and if z 2 ^ 0: 

V^i,2 2 G C x C \z\Zi\ = \zi | \z 2 

Zl = \Zl 

Z 2 \z 2 




info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Proof 4.3.9. First: 

I-1-2I 2 = (ziz 2 )(zizj) = (z 1 z 2 ) (ZiZ 2 ) = (z 1 z 1 )(z 2 z 2 ) = |-i| 2 |- 2| 2 => \ZlZ 2 \ = l^lll^l 


(we will prove a little further below that generally Z\Z 2 = Z\Z 2 ) and for z 2 7 ^ 0 : 



Zl Z 1 

z 1 1 

Zl 1 


\zi 1 

z 2 

^2 Z 2 

= —^1 — 

Z2 Z 2 

= —z 1 — 

Z2 Z 2 


|^2 | 

and taking root square this finish the proof. 

□ Q.E.D. 

P5. We affirm that: 

|z | 2 = zz (2.83) 

Proof 4.3.10. This is immediate: 

|z | 2 = (\/x 2 + V 2 ) 2 = (x + i y)(x — i y) = x 2 — i xy + iyx + y 2 = x 2 + y 2 = zz (2.84) 

□ Q.E.D. 

P 6 . We affirm that: 

Vz,z' GC z = z, z + z' = z + z', zz' = zz' (2.85) 

Proof 4.3.11. The first one is immediate: 

z = x + iy = x — iy = x + iy = z (2.86) 


z + z' = (x 1 + iy 1) + (x 2 + i y 2 ) = {x 1 + x 2 ) + i(j/i + y 2 ) = (xi + x 2 ) - \{yi + y 2 ) 

= (xi - iyi) + (x 2 - i y 2 ) = z + z' 



zz' = (xi + iyi)(x 2 + i y 2 ) = (xix 2 - 2/12/2) + i(xiy 2 + y 2 x 2 ) 

= (xiX 2 - 7/17/2) + i{xiy 2 + 7/2X2) = (xi - iyi)(x 2 - i y 2 ) = zz' 


□ Q.E.D. 


Rl. In mathematical terms, the first proof helps to show that complex conjugation 
is what is named an "involution" (in the sense that it is changing anything ...). 

R2. Also in mathematical terms (it is only the vocabulary!), the second proof 
shows that the combination of the sum of two complex numbers is what we name 
a "group automorphism (C, +)" (see section Set Theory). 

R3. Again, for vocabulary ... the third proof show that the combination of the prod- 
uct of two complex numbers is what we name a "field automorphism (C, +, x)" 

(see section Set Theory). 

V / 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

P7. We affirm that for 0 different from zero: 


Proof 4.3.12. We will restrict ourselves to the proof of the second relation that is a general 
case of the first (for z — 1). 

/ xi + nji A _ / xi + iyi A / x 2 - i?y 2 \ _ [xix 2 + 2/12/2) + 2(2/1 £2 ~ 2/2^1) 

\^2 + i?/2/ V^2 + *2/2/ V^2 -ij/2/ A + Vl 

x\x 2 + j/rz /2 _ . y\X 2 - V 2%1 _ {xix 2 + yiy 2 ) + i ( 2 / 2^1 ~ 2 / 1 ^ 2 ) 

^2 + 2/1 1 xl + yl x\ + yl 

{x\ ~ iyi)(a ?2 + iy 2 ) _ x x - i;j/i x 2 + 12/2 _ z_ 

x\ + y\ x 2 - iy 2 x 2 + i y 2 z' 

□ Q.E.D. 

P8. We have: 

\zi + Z 2 \ ^ |^1 1 + |^2 I (2.91) 

for any complex number z\,z 2 (strictly speaking non-zero complex numbers, otherwise 
the concept of argument of the complex number that we will see further below is undeter- 
mined). Furthermore the equality holds if and only if z\ and z 2 are collinear (the vectors 
are "on the same straight line") and of the same direction, in other words.... if it exist 
A G K. such as Xzi = z 2 . 

Proof 4.3.13. Directly we have: 

\zi + z 2 j 2 = \zi\ 2 + 2$l(ziz 2 ) + \z 2 \ 2 ^ \zi\ 2 + 2 1 01 ^ 2 1 + \z 2 1 2 = (|^i| + | - 2 ^ 2 1 ) 2 (2.92) 

This inequality may not be obvious to everyone, therefore let us develop it a bit and let us 
assume it true: 

\zi + z 2 \ 2 ^(|0i| + | ^2 1 ) 2 

y/(x 1 + x 2 ) 2 + ( 2/1 +y 2 )^ < (>Jxl + 2/1 + ^x\ + 2 / 2 )^ 

{Xi + X 2 ) 2 + (y 1 + y 2 ) 2 <(^1 + y\) + 2 \j x\ + y\\j x\ + y\ + (x 2 2 + y\) 

x\ + 2x1X2 + xj + y\ + 22/12/2 + y\ <(x? + y\) + 2 ^x? + y\yjx\ + y\ + (x^ + y|) 


After simplification: 

2T^2 + 2/12/2 < V^i +2/1^2 + 2/2 

(X 1 X 2 + 2 / 12 / 2) 2 + yl){xl + 2 / 2 ) (2 - 94) 


22,0 1 2 2 > 2 2, 22, 22, 22 

XiX 2 + 2x 1 x 2 y 1 y 2 + y x y 2 ^ x x x 2 + x x y 2 + y x x 2 + 2/i2/ a 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

and again after simplification: 


Zx^ym < + Z/1^2 

0 ^ x\y\ - 2x 1 x 2 y 1 y 2 + y\x\ ( 2 - 95 ) 


0 <(afij / 2 -t/1^2) 2 

So as the square brackets is necessarily positive or zero it follows: 

0 < [x\y 2 - yix 2 f (2.96) 

This last relation thus shows that inequality is true. 

□ Q.E.D. 


In fact there is a more general form of this inequality named "Minkowski inequal- 
ity", proved in the section of Vector Calculus (complex numbers can indeed be 
written in the form of vectors as we will see later). 


2.2.1.X Geometric Interpretation of Complex Numbers 

We can also represent any complex number a + ib or a — ib in a plane defined by two 
axes (two dimensions) of infinite length and orthogonal between them. The vertical axis 
represents the imaginary part of a complex number and the horizontal axis the real part (see 
figure below). 

So there is correspondence between the set of complex numbers and the set of vectors of the 
Gaussian plane (notion of affix as we will see more deeply in the section of Vector Calculus). 

We sometimes named this type of representation "Gauss plane" or "Gauss map": 

and then we write: 

Aff(r) = a + ib 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

We see on this diagram that a complex number has thus a vector interpretation (see section 
Vector Calculus) given by: 


a + lb 

where the canonical basis is defined such as: 


r — \z\ — \J a 2 + b 2 




Thus, is the unitary basis vector of the carried by the horizontal 

unitary basis vector carried by the vertical imaginary axis M; and r is 
that is positive or zero. 

axis M and is the 
the module (the norm) 

This has to be compared with the vectors of M 2 (see section Vector Calculus): 

v = xei + ye 2 = x Q + y ^ = Q (2.101) 


||E|| = sjx 2 + y 2 (2.102) 

so that we can identify the complex plane with the Euclidean plane. Thanks to the geometric 
interpretation of the Gaussian plane, the equality below is immediate for example and avoids 
making some developments: 

a + bi 
b + a 


In addition, the definitions of the cosine and sine (see sectionTrigonometry) give us: 



a = r cos((f) b = r sin(^) 

r = V a 2 + b 2 

Lp 1 = cos 1 



z = a + ib = r cos(^(+ir sin(^) = r(cos(</j) + i sin(^)) = rcis(yj) (2.106) 

complex number which is always equal to itself modulo 2 tt by the properties of trigonometric 

z = r(cos(tp) + isin(^)) 

r(cos(</9 + 2 /c7t) + i sin(<£> + 2kn)) 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

with k E N and where <p is named the "argument of z" and is traditionally denoted by: 

arg(z) (2.108) 

The properties of the cosine and sine (see section Trigonometry) lead us directly to write for the 

arg(z) = — arg(z) and arg(— z) = arg(z + tt) (2.109) 

We also prove among other things with the Taylor series (see section Sequences and and Series) 


If 3 G9 5 , m 2k+1 

sin( V ) = ¥3 -_ + _-... + (-1) 


which sum is similar to: 

+ 2 , + 3 

e 1 ^ = 1 + 99+7— + 7— + ... + 7— + 

2! 3! k\ 

but instead perfectly identical to the Taylor expansion of e lx : 

2 3 k 

e ltp — 1 + \p— — 1 ^— + . . . + i k y- + ... = cos(<p) + isin(<£>) 

2! 3! 


So finally, we can write: 

z = r(cos(ip) + isin(<£>)) = re llf 

relation named "Euler’s formula". 

Using the properties of trigonometric functions: 

cos(99) + isin(<£>) = e lip 
cos(99) — isin(<£>) = e~ xif> 



( 2 . 112 ) 




Depending on we sum or subtract the this gives us the "Euler formulas" or "Moivre and Euler 

. . e iv + e~ lip 
cos(+) = 

— p-'vp 

sin (+) = 2i 


Note that the angle can be a purely a complex number! This is to say that in all generality 
trigonometric functions can be considered as functions that go from C to C. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Thanks to the exponential form of a complex number, very commonly used in many fields of 
physics and engineering, we can easily draw relations such that starting from (remember that 
cis is an old notation that stands for the cos(<^) + i sin ( 93 ) being in the parenthesis): 

z = r(cos(</?) + isin(<£>)) = rcis(^) = re lip 

Zi = ri(cos(<pi) + isin(</?i)) = rycis^i) = rie 1<f>1 (2.117) 

2:2 = r 2 (cos(</? 2 ) + isin(</? 2 )) = riris(</? 2 ) = r 2 e lip2 

and assuming known the basic trigonometric identities (see section Trigonometry) we have the 
following relations for the multiplication of two complex numbers: 

Z\Z 2 = r x r 2 [cos(y?i + ip 2 ) +isin(^i + tp 2 )\ = r 1 e t¥ ’ 1 r 2 e * ¥ ’ 2 = rir 2 cis (^ 1 + ip 2 ) — r 1 r 2 e l( ' ipl+ip2) 



arg(zi 2 : 2 ) = argfo) + &rg(z 2 ) 

and therefore if n is a positive integer: 

arg(z n ) = narg(z) 

For the module (norm) of the multiplication: 

\ziz 2 \ = \rie lipi r 2 e llp2 \ = \rir 2 e l ^ ipi+ip2} I = r\r 2 = |fi||r 2 | 


\z m \ = \z\ m 



( 2 . 121 ) 


For the division of two complex numbers: 

— = — [cos(<pi - <p 2 ) + isin(y>i - if 2 )\ = — cis((^i - <p 2 ) = — = — e 

r 2 

r 1 
— < 
r 2 

= = 7 l e K‘Pi-‘P2) (2.123) 

r 2 e llf>2 r 2 

The module of their division then comes immediately: 

\zi 1 

1*2 I 


therefore we have for the argument: 

zi \ / rie 

arg \ — = arg 

, z 2 ) \r 2 e 

i 0 i ' 


= arg ( ^ e <(vi_<pa) » = ^ = 

= </A ~ ^2 = arg(^i) - arg(^ 2 ) (2.125) 

and it comes immediately: 

arg^ -1 ) = arg (( re 1 ^) = arg \ = — <p = — arg(^) 

For the power of a complex number (or root): 

z m = r m e im v = r m [cos(m^) + isin(^)] = r m cis(m(p ) 




info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

which gives us immediate a already proved previously: 


and for the argument: 

arg(^ m ) = arg ((re ltp ) m ) = arg (r m e m¥ ) = rrup = rnarg(z) (2.129) 

In case we have a unit module (norm equal to 1) as z — cos(<p) + i sin(</?) we then have the 

(cos(fyj) + isin(<^)) m = (e 1¥ ) m = e imip = cos (rrup) + isin (rrup) (2.130) 

named "De Moivre formula". 

For the natural logarithm of a complex number, we trivially have the following relation which 
is discussed in the section of Analysis Analysis: 

ln(z) = In (re 11 * 3 ) = ln(r) + 'up (2.131) 

where ln(z) is often in the complex case written Logfzj with an uppercase "L". 

All previous relations could of course be obtained with the trigonometric form of complex 
numbers but then require some additional lines of mathematical developments. 

2.2.7. 1.1 Fresnel Vectors (phasors) 

A sinusoidal variation f(t ) = rsin(cct) can be represented as the projection (see section 
Trigonometry) on the vertical y - axis (imaginary axis the set C) of a rotating vector r at angular 
velocity u around the origin in the plane xOy: 

Such a rotating vector is named "Fresnel vector" and can be well interpreted as the imaginary 
part of a complex number given by: 

r = 3 ?(r) + i $s(r) = r(cos(cct) + isin (cut)) = re 1UJt = re 1(t> (2.132) 

That is to say: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Figure 4.13 - Fresnel rotating vector 

We will see the phasor again explicitly in our study of wave mechanics and geometrical optics 
(as part of diffraction) in the sections with the corresponding names. 

2.2.7. 2 Transformation in the plane 

It is customary to represent real numbers as points on a graduated line. The algebraic 
operations have their geometric interpretation on it: the addition is a translation, a multiplica- 
tion a centered scaling. 

In particular we can talk about the "square root of a transformation." A translation of amplitude 
T may be obtained as the iteration of a translation of amplitude Tj 2. Similarly, a scaling 
of amplitu S can be achieved as iterated scaling of faction a fS. In particular an homothety 
(scaling) of a factor 9 can be composed of two homotheties (scaling) of respectively 3 (or —3). 

Then we can say that the square root takes on a geometric sense. But what about the square root 
of negative numbers? In particular of the square root of —1??? 

A scaling of factor —1 can be seen as a symmetry with respect to the origin. But if we see this 
transformation in a continuous manner. Therefore a —1 scaling factor also be seen as rotation 
of 7r rotation around the origin. 

So, the problem of negative square root is simplified. Indeed, it is not difficult to break down 
a rotation of 7r radians inot two transformations: we can repeat either a rotation of 7r/2 or of 
— 7t/2. The image of 1 is the square root of —1 and i is situated on a perpendicular to the origin 
at a distance 1 either up or down. 

Having successfully positioned the number i it not difficult anymore to put other complex num- 
bers in the Gauss plane. We can therefore associate to 2i the product of the scaling of a factor 2 
(see section Euclidean Geometry) by the rotation of center O with angle of 7t/2, that is to say a 
similitude centered at the origin. This is what we will endeavor to prove now. 


= X\ + i yi = ae ia , z 2 = x 2 + iy 2 = be 113 (2.133) 

We have the following geometric transformations properties for complex numbers (see the sec- 
tion Trigonometry for the properties of sine and cosine) that we can happily combine at our 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


PI. The multiplication of z \ by a real number A in the Gauss plane corresponds trivially to 
a homothety of center O (the intersection of real and imaginary axis for recall...) and of 
ration A. 


Xzj = (A a)e ia (2.134) 

P2. Multiplying of z\ by a complex number of unit module corresponds a rotation of center 
O and of angle corresponding to the argument of z\. Indeed: 

zqZ\ = e iuj ae ia = ae^ a+U}) (2.135) 


Then we see immediately, for example, that multiplying a complex number by i 
(that is to say a complex number with sin (a;) = 1, cos(cc) = 0) corresponds a 
rotation of 7t/2 

Theorem 4.4. It is interesting to notice that in vector form the rotation of center O of z\ 
by z 0 can be written using the following matrix: 

-y o 

Vo x o 


Proof 4.4.1. We have just seen before that z 0 zi is a rotation of center O of and angle c o. 
We just need to write it first in the old style: 

z 0 Zi = (x 0 + iy 0 )(xi + iyi) = (x 0 xi - y 0 yi) + i(x 0 yi + VoXi) 

giving in vector form: 

^ XqX\ - t/ol/iA . ( 0 

Z 0 Zi = 


+ 1 

XoVi + yoX\ 



thus the linear equivalent application is: 

x o -y o 

yo x o 

XoXi - yo x i 

Xoyi + 2/02/1 


or as well (we fall back on the rotation matrix in the plane we that we will see in the 
section of Euclidean Geometry which is a remarkable result!) using: 

z = r(cos(tp) + isin(</?)) = r(cos(cp + /c27t) + isin(</? + k 2tt) 


and in the particular and arbitrary case where r is unitary (in order to have a pure rota- 

0 = (cos(<y 2 + k2n) + isin(<£> + k27r) = x 0 + i y 0 (2.141) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

we have immediately (we took again the same notations for the angle as the one we we 
have in the chapter Geometry): 





— sin u 





cost 0 

/ x 0 x i - yol/i \ 
V a '02/i + 2/o^i / 

/ cos(w)xi — sin(ce)?/i 
ycos(tc)2/i + sin(ce)a;i 


Note that the rotation matrix can also be written as: 

cos(ce) — sin(u;) 
sin(ce) cos(ce) 

as well: 


1 0 
0 1 

+ sin(ce) 



cos(ce)/ + sin(ce) J (2.143) 

-y o 
. 2/0 x 0 


1 0 
0 1 

xo ■ I + 2/o • J 


□ Q.E.D. 

Thus we see that the rotation matrices are not only applications but also are complex 
numbers (well it was obvious from the start but we had to show it in an aesthetic and 

simple way). 

So, we have for usage to put that: 

1 = 



and i = 

0 -1" 

1 0 J 

or with another common notation in linear algebra: 

1 = 

1 O' 
0 1 

and i = 

0 -1 

1 0 



The field of complex numbers is isomorphic to the field of real square matrices of dimen- 
sion 2 of the type: 

x 0 -2/o 
,2/o x 0 


It is a result that we use many times in various section of this book for specific studies in 
algebra, geometry and relativistic quantum physics. 

P3. The multiplication of two complex corresponds to a homothety added to a rotation. In 
other words, a "direct similarity". 

Proof 4.4.2. 

Z\Z 2 = ae ia be ip = (a6)e i( ^ } (2.148) 

so this is indeed a similarity of ratio b and angle f3. 

At the opposite, the following operation: 

Z\Z 2 = ae ia be~ if} = (i ab )e i{ ^ (2.149) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

will be named a "retrograde linear similarity". 

Otherwise, it returns trivially an already known following relation: 

arg^a) = argfo) + arg (z 2 ) (2.150) 


Rl. As the sum of two complex numbers z\ + Z 2 can not have a special simplified 
mathematical notation in any form whatsoever, then we say that the resulting 
quantity is equivalent to an "amplitude translation". 

R2. The combination of a direct linear similarity (multiplication of two complex 
numbers) and an amplitude translation (sum by a third complex number) is what 
we name a "direct linear similarity". 


□ Q.E.D. 

P4. The conjugate of a complex number is geometrically symmetrical with respect to the axis 

such that: 

Z\ — X\ — ivy i = r(cos(a) — isin(ce)) = r(cos(— a) — isin(— a)) = re~ ia (2.151) 
without forgetting that (basis of trigonometry): 

cos(<^) = cos((y> + k27i) sin(<^) = sin(<£> + k2ir) (2.152) 

This gives us a known result: 

arg(fi) = —arg ( 01 ) (2.153) 

From which we get the following property: 

r(cos(<p + 7r) + isin(<£ + 7r)) = r(— cos ip — i sin tp) = — r( cosy? + isin</?) = —z\ 



arg^x + 7T = — arg(— z\) (2.155) 

P5. The negation of the conjugate of a complex number is geometrically its symmetrical with 
respect to the imaginary axis such that: 

—Z\ = —x\ + iy/i = r(— cos a + i sin a) = r(cos(7r ± a) + i since) (2.156) 


Rl. The combination of the properties P4, P5 is named a "retrograde similarity". 

R2. The geometric operation that consist to take the inverse of the conjugate of a 
complex number (that is to say z _1 ) is named a "pole inversion". 


inf o@ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

P6. The rotation of coordinate cente c and angle (f is given and denoted by: 

K(Z!) = c + e i *{z 1 -c) (2.157) 

Some explanations could be useful for some readers: 

The complex c gives a point in the Gaussian plane, which will be the center of rotation. 
The difference z\ — c gives the chosen radius r. The multiplication by e ltp is the coun- 
terclockwise rotation of the radius from the origin of the Gaussian plane. Finally, the 
addition by c is the necessary translation to take back the rotated radius r at its original 
place before the rotation (center c). Which gives schematically: 

Figure 4.14 - Representation of the complex rotation 

P7. On the same idea, we get and denote an homothety of center c and ratio A by: 

'H(zi) — c + \{z\ — c) (2.158) 

Some explanations could be useful for some readers: 

The difference Z\ — c always gives the radius r and c a central point in the Gauss plane. 
The expression A(zi — c) gives the homothety of the radius from the origin of the Gaussian 
plane, and finally by adding c gives the necessary translation for the homothety to be see 
as being made from center c. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

2.2.8 Quaternion Numbers 

Also named "hypercomplex" quaternions numbers were invented in 1843 by William Rowan 
Hamilton to generalize complex numbers. 

Definition (#27): A "quaternion" is an element (a, b, c, d) G M 4 and for which we denote by H 
the set that contains it and what we name the "set of quaternions". 

A quaternion can also be represented in a row or column such as: 

(a, 6, c, d) = 

/ a' 

1 b 



We define the sum of two quaternions (a, b , c, d) and (a', b\ c', d!) by: 

(a, 5, c, d) 4- (o , b , c , d ) — (cz - E ci , - E , c - E c , cZ — E cZ ) 




It is the natural addition in M 4 seen as a M- vector space (see section Set Theory). 


The associativity is verified by applying the corresponding properties of the operations on M. 
We also define the multiplication: 

(a, 6, c, d) ■ (a 1 ,b ' , d , d') (2.161) 

of two quaternions (a, b , c, d) and (a', b' , d, d!) by the expression: 

(a, b , c, <T) • ( a , (/, c ; , </) = 

/ aa' — bb' — cc' — ddl 
{ah' + ba') + {cd! — dc ') 
\ac' + ca ) — ( bd ' — db) 
\(da' + ad') + (bd — cb') / 


It may be hard to accept but we will be a little further below that there is a family resemblance 
with the complex numbers. 

We can notice that the law of multiplication is not commutative. Indeed, taking the definition 
of the multiplication above, we have: 

( 0 , 1 , 0 , 0 ) -( 0 , 0 , 1 , 0 ) = ( 0 , 0 , 0 , 1 ) 

( 0 , 0 , 1 , 0 ) -( 0 , 1 , 0 , 0 ) = ( 0 , 0 , 0 ,- 1 ) 

But we can also notice that: 

( 0 , 1 , 0 , 0 ) ■ ( 0 , 0 , 1 , 0 ) = - 1 ( 0 , 0 , 1 , 0 ) • ( 0 , 1 , 0 , 0 ) 




is the natural addition in M 4 seen as a M-vector space (see section Set Theory). 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

The law of multiplication is distributive with the addition law but it is an excellent example 
where we must still be careful to prove the left and right distributivity, since the product is not 
commutative! The multiplication is neutral element: 

(1,0, 0,0) (2.165) 


(1, 0, 0, 0) • (a, b, c, d) 

(a, b, c, d ) • (1, 0, 0, 0) 

(a, b, d, c ) 

Any element: 

(a, b,c,d ) G HI* = HI — {(0, 0, 0, 0)} 

is inversible. 

Indeed, if (a, 6, c, d) is a non-null quaternion, we then have necessarily: 

a 2 + b 2 + c 2 + d 2 7 ^ 0 




otherwise the four numbers a, b , c, d are of square null, so all zero. Given then the quaternion 

(ai, bi,Ci,di) defined by: 

a\ = 
bi = 

Cl = 

d, = 

a 2 + b 2 + c 2 + d 2 

a 2 + b 2 + c 2 + d 2 


a 2 + b 2 + c 2 + d 2 

a 2 + b 2 + c 2 + d 2 


then by applying mechanically the definition of the multiplication of quaternions, we check 

(a, b, c, d) ■ (ai, b\, c±, d\) = (ai, b±, ci,d\) ■ (a, 6, c, d) = (1, 0, 0, 0) 


this latter quaternion is therefore the inverse for the multiplication! 

Let us prove (for general knowledge) that the field of complex numbers (C, +, x) is a subfield 

of (HI, +, x). 


We could also have put this proof in the section of Set Theory because we will make use 
of a lot of concepts that are have seen there but it seemed to us a little more relevant to 
put instead the proof here. We expect the reader to tolerate this choice. 

v ! w 

Given HI' set set of quaternions of the form (a, 6, 0, 0). If HI' is not empty, and if (a, 6, 0, 0), 
(a', b', 0.0) are elements HI' the (HI', + x) is a field. Indeed: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

PI. For subtraction (and therefore the addition): 

(a, b, 0, 0) — (a', b', 0, 0) = (a — a', b — b' , 0, 0) G HI' 


P2. The multiplication: 

(a, b, 0, 0) • (a', b ', 0, 0) = (aa' — bb', ab' + ba\ 0, 0) G HI' (2.172) 

P3. The neutral element: 

(1,0, 0,0 ) g e' 

P4. And finally the inverse: 

a 2 + b 2 


a 2 + b 2 

g e' 

of (a, 6, 0, 0) is still in. 



Therefore (El', +, x) is a sub field of EL Given then the application: 

/ : a + ib —¥ (a, 6, 0, 0) 

c ->• e 

/ is bijective, and we easily check that for any complex zi, z 2 , we have: 

f( z 1 + z 2) = f( z l) + f( z 2) 
f( z 1-2) = f( z l)f( z 2) 

Therefore / is an isomorphism of (C, +, x) on (HI', +, x). 



This isomorphism has for interest (caused) to identify C to H' and to write CcH, the laws of 
addition and subtraction on El extending the already known operations of C. 

Thus, by convention, we will write any element of (a, 6, 0, 0) of El' in the complex form a + 
ib. Particularly 0 is the element (0, 0, 0, 0), 1 is the element (1, 0, 0, 0) and i and the element 
( 0 , 1 , 0 , 0 ). 

We denote by analogy and by extension j the element (0, 0, 1, 0) and k the element (0, 0, 0, 1). 
The family {1, i, j, k} form a basis of all quaternions seen as a vector space on M, and we will 

a + bi + cj + dk 


the quaternion (a, b, c, d ). 

The notation of quaternions as defined above is perfectly suited to the multiplication operation. 
For the product of two quaternions we get by developing the expression: 

(a + bi + cj + dk) ■ (a' + b'i + c'j + d'k) (2.178) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

16 terms that we have to identify to the original definition of multiplication of quaternions to 
get the following relations: 

i ' j = k = -j ■ i 
i • k = i = — k • i 

. . . (2.179) 

k ■ l = j = — l • k 

•2 -2 1 2 1 
i = ] = k = -1 

Which can be summarized in a table: 























— i 



We can see that the expression of the multiplication of two quaternions looks partly much like 
a vector product (denoted x in this book) and dot product (denoted o in this book): 

/ aa' — bb' — cc' — ddl 
(ab' + ba') + ( cd! — dc!) 
( ac ' + ca ') — ( bd ' — db') 
\{da 1 + ad') + {be' — cb' 


If this is not evident (which would be quite understandable), let make a concrete example: 


Given two quaternions without real part: 

p = x\ + y] + zk q = x'\ + y') + z'k (2.182) 

and u, v the vectors of M 3 of respective components (x, y, z ) and (x 1 , ?/, z'). Then the 

pq=(0,u)(0,v) (2.183) 

is equal to: 

p ■ q = {—xx 1 — yy l — zz\ yz 1 — zy ' , — xz! + zx\ xy' — yx!) = (—u ov ,u x v) 


We can also for curiosity interest us to the general case ... Given for this two quaternions: 

p=(a,u ) q=(b,v ) (2.185) 

Then we have: 

p-q = (a + (0, u)) ■ {b + (0, v)) 

= ab+ (0, av) + (0, bu ) + {—u o v,u x v) (2.186) 

= ( ab — u o v, av + bu + u x v) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Definition (#28): The center of the non-commutative field (H, +, x) is the set of elements of 
H commuting for to the law of multiplication with all the elements of H. 

Theorem 4.5. The center of (H, +, x) is the set of real numbers! 

Proof 4.5.1. Give Hi is the center of (H, +, x), and {x, y, z, t) a quaternion. We must have the 
following conditions are met: 

Given (x, y, z, t ) G Hi then for any (a, b, c, d) G H we seek: 

(x, y, z, t ) • (a, b, c, d) = (a, b, c, d) ■ (x, y, z, t ) 

which give by developing: 

xa — yb — zc — td — ax — by — cz — dt 
xb + ya + zd — tc = ay + bx + ct — dz 
xc : + za — yd + tb = az + cx — bt + dy 
ta + xd + yc — zb = dx + at + bz — cy 

after simplification (the first line of the previous system is equal to zero on both sides of equal- 

ct — dz = 0 

bt-dy = 0 (2.189) 

bz — cy = 0 

the resolution of this system gives us: So that the quaternion (x, y, z, t) is the center of H it 
must be real (not imaginary parts)! 

□ Q.E.D. 



Just as for complex numbers, we can define a conjugate of quaternions: 

Definition (#29): The conjugate of a quaternion Z = (a, b, c, d) is the quaternion Z = 

(a, —b, —c, —d). 

Just as for the complex number, we notice that: 

1 . First clearly that if Z = Z then it means that Z G M 

2. That Z + ZgI 

3. That by developping the product Zl we have: 

ZZ = (a, b, c, d) ■ (a, —6, — c, —d) 

= (a 2 + b 2 + c 2 + d 2 , —ab + ba — cd + dc, —ac + ca + bd — db , da — ad — be + cb ) 
= (a 2 + b 2 + c 2 + d 2 , 0, 0, 0) = a 2 + 6 2 + c 2 + d 2 G M 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

that we will adopt, by analogy with complex numbers, as a definition of the norm (or 
module) of quaternions such as: 

\Z\ = Vz -Z (2.191) 

Therefore we also have immediately (relation which will be useful later): 

\ZZ'\ = /{ZZ^JI 7 (2.192) 

As for complex numbers (see below), it is easy to show that the conjugation is an automorphism 
of the group (HI, +). 

Z + Z' = {a + a', —b — b' , — c — c' , —d — d!) 

= (a, —b, — c, —d) + (a\ —b', — d, —d') (2.193) 

= Z + Z' 

It is also easy to prove that it is involutive. Indeed: 

Z = (a, -(-&), -(-c), -HO) = (a, b, c,d) — Z (2.194) 

But the conjugation is not a multiplicative automorphism of the field (HI, +, x). Indeed, if we 
consider the multiplication of Z, Z' and take the conjugate: 

ZZ' = 

aa' — bb' — cd — dd' 
{at/ + ba ') + {cd' — dd) 
{ad + ca) — {bd' — db) 
\{da' + ad') + {bd — cb , ) / 


ZZ' = 

aa' — bb' — cd — dd' 
-{ab' + ba') — {cd' — dd) 
-{ad + ca) + {bd' — db) 
-{da! + ad') — {bd — cb') / 

we see immediately (at least for the second row) that we have: 


ZZ' ^ ZZ' 


Let us now back to our norm (or module) .... For this, let us calculate the square of the norm 

We know (by definition) that: 

ZZ' -- 

Let us denote this product in such 

ZZ' I 2 = {ZZ') ■ {ZZ') 

/ aa' — bb' — cd — dd' \ 
{ab' + ba') + {cd' — dc') \ 
{ad + ca') — {bd! — db') I 
\{da' + ad') + {bd — cb) J 

a way that: 

Z.Z' = {a,P, 7,A) = Z” 

Then we have: 

= a 2 + /3 2 + y 2 + A 2 






info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

substituting it comes: 

{ZZ') • {ZZ') = {ao! - bb' - cd - dd') 2 + { ab ' + ba' + cd' - dc') 2 

+ {dc ccl — bd + db ) 2 -\- {dd -f- dd be — cb ) 2 

after an elementary algebraic development (frankly boring) we find: 

(ZZ') ■ (ZZ 7 ) = (a' 2 + b 72 + c' 2 + d' 2 )(a 2 + b 2 + c 2 + d 2 ) = \Z’\ 2 \Z\ 2 


(ZZ') ■ (ZZ') = \Z\ 2 \Z’\ 2 = \ZZ’\ 2 





The norm is therefore a homomorphism of (HI, x) in (M, x). Subsequently, we will 
denote by G all the quaternions of unit norm. 

2.2.8. 1 Matrix Interpretation of Quaternions 

Given q and p two quaternions and given the application: 

p — >■ qp 


The (left) multiplication can be made with a linear application (see section Linear Algebra) on 


If q is written: 

a + bi + cj + dk 

this application has for matrix in the basis l,i,j,k: 


a —6 — c —d' 
b a —d c 
c d a —b 
d — c b a 


What we check well: 

"a —b 
b a 
c d 
d —c 

ZZ' = 

/ aa' — bb' — cc' — dd' 
(ab' + ba') + (cd' — dc') 
(ad + ca) — (bd' — db') 
\(da' + ad') + (bd — cb') / 


In fact, we can then define the quaternions as the set of matrices with the visible structure above 
if we wanted to. This will then reduce them to a sub vector space of M 4 ( 

Especially, the matrix of 1 (the real part of the quaternion q) is then nothing other than the 
identity matrix: 

ilG = 

10 0 0 
0 10 0 
0 0 10 
0 0 0 1 

= 1 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

as well: 




0 ■ 









Mi = 






, Mi = 






A/fb — 












7 K 







0 . 




0 . 




0 . 

(2.209) Rotations with Quaternions 

We will see now that conjugation by an element of the group G of the quaternions of 
unit norm can be interpreted as a pure rotation in space! 

Definition (#30): The "conjugation" by a non-nul quaternion q of unit norm is the application 
S q defined on H by: 

S q : p — y q • p • q 1 = q ■ p ■ q 


and we affirm that this application is a rotation. 



Rl. As q is of unit norm 1, we have obviously |g| — qq — 1 therefore q 1 = q. This 
quaternion can be seen as the proper value (of unit norm) to the application (matrix) p on 
the vector q (we are in a similar situation as the orthogonal rotation matrices seen in the 
in section Linear Algebra). 

R2. S q is a linear application (so if it is rotation, the rotation can be decomposed into 
several rotations). Indeed, let consider two quaternions p\ , p 2 and two real number Ai , A 2 , 
then we have: 

Sq(XiPi + A 2P2) = g(AiPi + A 2 p 2 )q = Aigpig + \ 2 qp 2 q = XiS q (p 1 ) + A 2 S q (p 2 ) 

( 2 . 211 ) 

V J 

Let us now check that the application is indeed a pure rotation. As we saw in our study of Linear 
Algebra and in particular of orthogonal matrices (see section Linear Algebra), a first obvious 
condition is that the application conserves the norm. 

Let us check this: 

\S q (p)\ = \qpq\ = \q\\p\\q\ = \p\ ( 2 . 212 ) 

Moreover, we can check that a rotation of a purely complex quaternion (such that then we 
restrict ourselves to M 3 ) and the same summed reverse rotation is zero (the vector sum up to its 
opposite cancel): 

S q (p) + S q (p ) = qpq + qpq = qpq + q(jpq) (2.213) 

we trivially check that if we have two quaternions q, p then p ■ q = qp since then: 

Sq{p) + Sqip) = qpq + q{pq) = qpq + (pq)q 

= q- p- q + q- p- q = q- p- q + q- p- q (2.214) 

= S q (p + p) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

for this operation to be zero, we immediately see that we need to restrict ourselves to the purely 
complex quaternions p. Since then: 

S q (p + p) = S q (0) = 0 (2.215) 

We conclude then that p must be purely complex so the for the application S q is a rotation and 
that S q {p) is a pure quaternion. In other words, this application is stable (in other words: a pure 
quaternion by this application remains a pure quaternion). 

The application S q restricted to all purely complex quaternions is thus a vectorial isometry, that 
is to say a symmetry or a rotation. 

We have also seen during our study of the rotation matrices in the section of Linear Algebra and 
Euclidean Geometry that such matrices should have a determinant equal to 1 so that we have a 
rotation. Let’s see if this is the case of S q : 

For this, we explicitly calculate in function of: 

q = a + bi + cj + dk 


the matrix (in the canoncial basis (■ i,j , kj) of S q and we calculate its determinant. Thus we 
obtain the coefficients of the columns of this application by remembering that: 

ij — k — -ji 
jk = i = —kj 
ki = j = —ik 
f = f = k 2 = -l 

and then by calculating: 

S q (i) = (a + bi + cj + dk)i(a — bi — cj — dk) 

= ( ai + b{i 2 ) + c(ji ) + d(ki))(a — bi — cj — dk) 

= (ai — b — ck + dj)(a — bi — cj — dk) 

= (a 2 i + ab — ack + adj) — ( ba — b 2 i — bcj — bdk) 

— ( cak — cbj + c 2 i + cd) + ( daj + dbk + cd — d 2 i) 

= (a 2 + 6 2 — c 2 — d 2 )i + {06 — ack + adj) — {kd — bcj — bdk) 

— ( cak — cbj + + ( daj + dbk + 

= ( a 2 + b 2 — c 2 — d 2 )i + 2 (ad + bc)j + 2 (bd — ac)k 



S q (j) = (a + bi + cj + dk)j(a — bi — cj — dk) 

= ( aj + b(ij) + c(j 2 ) + d(kj))(a - bi - cj - dk) 

= ( aj + bk — c — di)(a — bi — cj — dk) 

= ( a 2 j + abk + ac — adi) + ( bak — b 2 j + bci + bd) 

— (ca — cbi — c 2 j — cdk) — ( dai + db — dck + d 2 j) 

= ( a 2 — b 2 + c 2 — d 2 )j + (i abk + &C — adi) + ( bak + bci + M) 

— (pec — cbi — cdk) — ( dai + — dck) 

= 2 (be — ad)i + (a 2 — b 2 + c 2 — d 2 )j + 2 (ab + cd)k 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

S q (k) = (a + bi + cj + dk)k(a — bi — cj — dk ) 

= ( ak + b(ik ) + c(jk) + d)(k 2 ))(a — bi — cj — dk) 

= ( ak — bj + ci — d)(a — bi — cj — dk) 

= ( a 2 k — abj + aci + ad) — ( abj + b 2 k + be — bdi) 

+ ( cai + cb — c 2 k + cdj) — (da — dbi — dej — d 2 k) 

= ( a 2 — b 2 — c 2 + d 2 )k + (—abj + aci + &d) — (abj + M — bdi) 
+ (cai + ^ + cdj) — (fid — dbi — dej) 

= 2 (ac + bd)i + 2 (cd — ab)j + (a 2 — b 2 — c 2 + d 2 )k 

We must then calculate the determinant of the following matrix (pfff ...): 

a 2 + b 2 — c 2 — d 2 2 (ad + be) 2 (bd — ac) 

2 (be — ad) a 2 — b 2 + c 2 — d 2 2 (ab + cd) 

2 (ac + bd) 2(cd — ab) (a 2 — b 2 — c 2 + d 2 ) 

( 2 . 220 ) 


remembering that (which also simplifies the expression of the terms of the diagonal as we can 
see in some books): 

a 2 + b 2 + c 2 + d 2 = 1 (2.222) 

we find that the determinant is indeed equal to 1. Otherwise, we can check this with Maple 

>with(linalg) : 

>A:=linalg [matrix] (3,3, [a~2+b~2-c~2-d~2,2*(a*d+b*c) , 

2 * (b*d-a*c) ,2*(b*c-a*d) ,a~2-b~2+c~2-d~2,2*(a*b+c*d) , 

2*(a*c+b*d) ,2*(c*d-a*b) , a~2-b~2-c~2+d~2] ) ; 

>factor(det (A)) ; 

Let us now show that this rotation is a half axis turn (the example that may seem particular is in 
fact general!): 

First, if: 

we have: 

q = xi + yj + zk 


Sq(q) = qqq = q (2.224) 

which means that the axis of rotation (x, y. z) is fixed by the application S q itself! 

On the other hand, we have seen that if q is a purely complex quaternion of norm 1 then: 

q~ l = q and q = —q (2.225) 

Which gives us the relation: 

q 2 = q- (~q) = q ■ (~(q x )) = -i 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

This result leads us to calculate the rotation of a rotation: 

Sq{S q {p)) = q(qpq)q = q 2 pq 2 = S q i{p) = (-l)p(-g) 2 = -pq 2 = (-l)p(-l) = p (2.227) 
Conclusion: Since the rotation of a rotation is a full turn, then S q is necessarily a half-turn: 

S q (p) = -p (2.228) 

relatively (!) to the axis (x, y, z ). 

At this stage, we can say that any rotation of the space can be represented by S q (the conjugation 
by a quaternion q of norm 1). Indeed, the half turns generates the group of rotations, that is to say 
that any rotation can be expressed as the product of a finite number of half-turns, and therefore 
by conjugation of a product of quaternions unitary norm (product which is itself a quaternion 
of unitary norm...). 

We will still give an explicit form connecting a rotation and the quaternion that represents it, 
just as we did for complex numbers. 

Theorem 4.6. Given u(x, y, z ) a unit vector and 9 e [0, 27 t] angle. The we affirm that the 
rotation of axis u and angle 9 corresponds to the application S q , where q is the quaternion: 


For this assertion is verified, we know we need that: 

• The norm of q is equal to 1 

• The determinant of the application S q is equal to 1 

• The application S q conserves the norm 

• The application S q returns all collinear vector to the axis of rotation on the axis of rotation 

Proof 4.6.1. Ok let us check every point: 

1. The norm of the quaternion previously proposed is indeed equal to 1: 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

and as u(x, y, z ) is of unit norm, we have: 

2,2,2 -i 

x + y + z = 1 



\q\ = cos 2 (0 + sin 2 (0 (x 2 + y 2 + z 2 ) = cos 2 (0 + sin 2 (0 = 1 (2.232) 

2. The fact that q is a quaternion of unit norm immediately leads to the fact that the deter- 
minant of the application S q is also equal to 1. We have already proved it above in the 
general case of any quaternion of norm 1 (necessary and sufficient condition). 

3. It is the same for the conservation of the norm. We have already proved earlier above 
that this was the case anyway when the quaternion q of norm 1 (necessary and sufficient 

4. Let us now prove that all collinear vector to the axis of rotation is projected onto the axis 
of rotation itself. Let us denote by q' the purely imaginary unitary quaternion xi + yj + zk. 
Then we have: 


q = cos I - ) + sm I - ) q 

Sg(q') = qq'q 



but as q' is the restriction of q to the pure elements that constitute it, this is equivalent as 
to write: 

S q W) = S q (q) = qqq = q 


Let us now show why we choose the writing 0/2. If v = (aq, yi, zi) denotes a unit vector 
orthogonal to u (therefore perpendicular to the axis of rotation), and p the quaternion 
xi + yj + zk then we have: 

Sq(p) = (cos (0 + sin (0 0 p (cos (0 - sin (0 0 

cos 2 (0 p + cos (0 sin (0 {qp - pq) - sin 2 (0 qpq 

We have shown during the definition of multiplication of two quaternions that: 


pq — -qp 


o\ , . 2 (Q\ / / 

q p — sm I - ) q pq 

therefore we get: 

S q (p) = cos 2 (0 p + 2 cos (0 sii 

= cos 2 (0 p + 2 cos (0 sin (0 qp + sin 2 (0 qp(-q) 

= cos 2 (0 P + 2 cos (0 sin (0 qp + sin 2 (0 qpq' 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

We have also prove earlier above that: 

S q {p) = ~p = qpq 


qpq' = s q ’(p) = -p 

(the half turn of axis (x, y, z)). So: 

0 , ,, 2 (e\ o (o\ . (0\ , . 2 

S q > (p ) = cos I - I p + 2 eos I - I sm I - I q p - sin 



= | cos 2 ( 0 - sin 2 P + 2 cos Q ] sm 



= cos(9)p + sin [9)qp 

□ Q.E.D. 

We know that p is the pure quaternion likened to a unit vector v orthogonal to the axis of 
rotation u itself equated withthe purely imaginary part of <{ . We notice then immediately that 
the imaginary part of the product (defined!) of the quaternion q'p is equal to the cross product 
u x v = w. This vector product therefore generates a vector perpendicular to u. v. 

The pair (?7, w) thus form a plane perpendicular to the axis of rotation u (that’s as for the simple 
complex numbers C in which we have the Gaussian plane and perpendicular to it the axis of 

Then finally: 

S q {p) = cos (9)p + sin {9)qp = cos {9)v + sin(6 ) )t7 (2.242) 

We fall back with on rotation based on a plane (but therefore be in space!) identical to that 
shown earlier above with the standard complex numbers C in the Gaussian plane. For more 
details the reader can refer the section of Spinor Calculus. 

So we know how to do any kind of rotation in space in a single mathematical operation and with 
a bonus: with the free choice of the axis! 

We can now better understand why the algebra of quaternions is not commutative. Indeed, the 
vector rotations of the plan are commutative but those of space are not like show us the example 

Given the initial configuration: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Y i. 


Figure 4.15 - Starting situation for quaternion rotations 

Then a rotation about the X-axis followed by a rotation around the Y axis: 

is not equal to a rotation around the V'-axis followed by a rotation about the axis A" : 

Figure 4.17 - Example of non-equivalence for quaternion rotation 

The results will be fundamental for our understanding of spinors (see section Spinor Calculus)! 

2.2.9 Algebraic and Transcendental Numbers 
Definitions (#31): 

Dl. We name "algebraic integer of degree n", any complex number that is a solution of an uni- 
variate algebraic equation of degree n, ie a polynomial of degree n (concept that we will 
discuss in the chapter of Algebra) whose coefficients are integers and whose dominant 
coefficient is equal to 1. 

D2. We name "algebraic number of degree n", any complex number that is a solution of an 
univariate algebraic equation of degree n, ie a polynomial of degree n whose coefficients 
are rational. 

The set of algebraic number is sometimes denoted by Q or A. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Theorem 4.7. A first interesting result and particularly in this area of study (mathematical 
curiosity ...) is that a rational number is an "algebraic integer of degree n" if and only if it’s 
an integer (read several times need...). In scientific terms, we the say that the ring Z is "fully 

Proof 4.7.1. We will assume that the number p/q , where p and q are two prime integers (that 
is to say that their ratio does not give an integer or more rigorously ... that the greatest common 
divisor of p, q is equal to 1 ! , is a root of the following polynomial (see section Calculus) with 
relativer integer coefficients (e Z) and whose dominant coefficient is equal to 1: 

x n CL n —iX n Oj\X -f- Uq (2.243) 

where the equality with zero of the polynomial is implicit. 

In this case: 

P n = -(a n _ip n 1 + . . . + aipq n 2 + a 0 q n l )q (2.244) 

Since the coefficients are by definition all integers and their multiple in the parenthesis also, 
then the parenthesis has necessarily a value in Z. 

Therefore, q (at the right of the parenthesis) divides a power of p (at the left of the equality), 
which is possible, in the set Z (because our bracket has a value in this same set for recall...), 
only if q is equal to ±1 (as they were prime together). 

So among all rational numbers the only that are solutions of polynomial equations with relative 
integer coefficients (e Z) for which the dominant coefficient is equal to 1 are relative integers! 

□ Q.E.D. 

To take another interesting and particular case, it is easy to show that any rational number is an 
algebraic number. Indeed, if we take the simplest following univariate polynomial: 

qx — p = 0 (2.245) 

where p and q are relatively prime and where q is different from 1. So as this is a simple 
polynomial with rational coefficients (e Q), after remaniment we have: 

x = 




So since p and q are relatively prime and q is different from 1, we have indeed that every rational 
number is an "algebraic number of degree 1". 

We also have the real (and irrational) number ■\f 7 l which is an "algebraic integer of degree 2" 
because it is the root of: 

x 2 - 2 = 0 (2.247) 

and the complex number i is also an "algebraic integer of degree 2" because it is the root of the 

x 2 + 1 = 0 (2.248) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


Definition (#32): A "transcendental number" is a real or complex number that is not algebraic. 
That is, it is not a root of a non-zero polynomial equation with rational coefficients. 

Theorem 4.8. The set of all transcendental numbers is uncountable. The proof is simple and 
requires no difficult mathematical development. 

Proof 4.8.1. Indeed, since the polynomial with integer coefficients are countable, and since each 
of these polynomials has a finite number of roots (see the Factorization Theorem in the section 
Calculus), the set of algebraic numbers is countable! But the argument of Cantor’s diagonal 
(see section Set Theory) states that real numbers (and therefore also the complex numbers) are 
uncountable, so the set of all transcendental numbers must be uncountable. 

In other words, there is much more transcendental numbers than algebraic numbers... 

□ Q.E.D. 

The best known transcendent numbers are 7r and e. We are still looking to provide you a proof 
more nice and intuitive than that of Hilbert or Lindemann-Weierstrass. 

Here is a small summary of all the stuff see until now: 




1 + » 



1.5 - 2x1 


e + xi 

ginary part 

Z N © 

/V 2 




1 + / 


A V2 + /V3 

1.7 - 2.8/ 

3 - 2/ 

x + iV 2 







Integer Rational 
Z C 

Real Algebraic 







-1 v 2 

-i 3 -*3 

V2 * 

-V3 e 

Irrational y 



z o 



Real part 

Figure 4.18 - Numbers Type N, Z, Q, M, €,... 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

2.2.10 Universe Numbers (normal numbers) 

Definition (#33): A "Universe number" also named "normal number" is a real number whose 
infinite sequence of digits in every base b is distributed uniformly in the sense that each of the 
b digit values has the same natural density 1/b. Intuitively this means that no digit, or (finite) 
combination of digits, occurs more frequently than any other. The set of Universe numbers is 
sometimes denoted U. 

While a general proof can be given that almost all purely real numbers are Universe numbers 
[ ] this proof is not constructive and only very few specific numbers have been shown to be 

Universe numbers. It is widely believed that the (computable) numbers \/2, i r, and e are Uni- 
verse numbers, but a proof remains elusive still in this year 2016. All of them however are 
strongly conjectured to be because of some empirical evidence. It is not even known whether 
all digits occur infinitely often in the decimal expansions of those constants. In particular, the 
popular claim "every string of numbers eventually occurs in 7r" or "the whole Holy book is 
contained in 7r is not known to be true. It has been conjectured that every irrational algebraic 
number is a Universe number, while no counterexamples are known, there also exists no alge- 
braic number that has been proven to be a Universe number in any base. 

More formally, let be a finite alphabet of b digits, and J2°° the set of all sequences that may 
be drawn from that alphabet. Let S € X ) 00 be such a sequence. For each a in X] let A/s (a, n) 
denote the number of times the letter a appears in the first n digits of the sequence S. We say 
that S' is a "simple Universe number" if the limit: 


n— >+oo 

N s (a,n) 




for each a. 

Now let w be any finite string in ]T* and let N s (w , n) to be the number of times the string w 
appears as a substring in the first n digits of the sequence S (for instance, if S = 01010101..., 
then AS,- (010, 8) = 3). Then S' is a "Universe number" if, for all finite strings w £ 


71— >■+ OO 

N s (a,n) 



b n 


S is therefore a Universe number if all strings of equal length occur with equal asymptotic 
frequency. A given infinite sequence is either a Universe number or not, whereas a pure real 
number, having a different base-6 expansion for each integer b > 2, may be a Universe number 
in one base but not in another. A "disjunctive sequence" is a sequence in which every finite string 
appears. A Universe number sequence is a "disjunctive sequence" but a disjunctive sequence 
need obviously not be a Universe number. 

It is possible to prove (yet we don’t wish not present this proof in a book on applied math- 
ematics) with the "Universe number theorem" that almost all pure real numbers are Universe 
number. The set of non-Universe numbers, though "small" in the sense of being a null set, is 
"large" in the sense of being uncountable (for example o rational number is normal to any base, 
since the digit sequences of rational numbers are eventually periodic!). For instance, there are 
uncountable many numbers whose decimal expansion does not contain the digit 5, and none of 
these are Universe number. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

2.2.11 Abstract Numbers (variables) 

Definitions (#34): A number may be considered as doing abstraction from the nature of the 
objects that constitute the group that it characterizes as well as how to codify it (Indian notation, 
Roman notation, etc.). We then say that the number is an "abstract number". In other words, 
an abstract number, is a number that does not designate the quantity of any particular kind of 


Arbitrarily, the human being has adopted a numerical system mainly used in the World 
and represented by the symbols 0, 1, 2, 3, 4, 5, 7, 8, 9 of the decimal system that will be 
supposedly known both in writing thant orally by the reader (language learning). 

V I y 

For mathematicians, it is not advantageous to work with these symbols because they represent 
only specific cases. What seek theoretical physicists and mathematicians are "literal relations" 
applicable in a general case and that engineers can according to their needs change these abstract 
numbers by numeric values that correspond to the problem they need resolve. 

These abstract numbers today commonly named "variable" or "unknown", used in the context 
of "literal calculation" are very often represented since the 16th century by: 

1. The Latin alphabet: 

a, b, c, d, e, . . . , x, y, z; A, B,C, D, E, . . . , X , Y, Z 

where the first lower case letters of the latin alphabet ( a,b,c,d,e ...) are often used to 
represent an abstract constant, while the lowercase letters of the end of the latin alphabet 
z) are used to represent entities (variables or unknowns) we seek the value. 

2. The Greek alphabet: 

A a 







r 7 

















P P 








T T 








x x 




Table 4.10 - Greek Alphabet 

which is particularly used to represent more or less complex mathematical operators (such 
as the index sum E, the indexed product II, the variational 5 , the infinitesimal element e, 
partial differential d, etc.) or variables in the field of physics (as u for the pulsation, v for 
the frequency, p for the density, etc.). 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

3. The modernized Hebrew alphabet (with less intensity...) 

As we have seen, a transfinite cardinal for example is denoted by the letter "aleph": J\f 0 . 

Although these symbols can represent any number there are some who can represent physical 
constants also named "Universal constant" as the speed of light c, the gravitational constant G, 
the Planck constant h, the number 7r, etc. 

We use very often still other symbols that we will introduce and define when reading this book. Domain of a Variable 

A variable is therefore likely to take different numerical values. All these values can 
vary according to the character of the problem considered. 

Given two numbers a and b such that a < b, then: Definitions (#35): 

Dl. We name "domain of definition" of a variable, all numerical values it is likely to take 
between two specified limits (endpoints) or on a set (like N, M, M + , etc.). 

D2. We name "closed interval with endpoints a and b", the set of all numbers x between these 
two values and we denote as example as follows: 

[a, b] = {x G M. | a < x < b} (2.251) 

The left notation is named obviously "interval notation", the right one is named "set- 
builder notation". 

D3. We name "open interval with endpoints a and b ", the set of all numbers x between these 
two values not included and we denote it as example as follows: 

]a, b[ = {x G M | a < x < b} (2.252) 

D4. We name "interval closed, left open right" or "semi-closed left" the following relation as 

[a, b[ = {x G M. | a < x < b} (2.253) 

D5. We name "interval open left, closed right" or "semi-closed right" the following relation 
as example: 

}a, b] = (x G M | a < x < b} 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Or in a summary and imaged form and as often denoted in Switzerland: 

Type Visual Math notation Explicitly 


a b 

f ] 

a < x < b 

Closed bounded interval 


a b 

r r 

a < x < b 

Semi-closed and bounded interval 
on a and semi-open on b (or left 
semi-closed and right semi-open) 

L L 

}a,b } 

a b 

a < x <b 

Semi-open bounded interval on a 
and semi-closed on b (or left semi- 
open and right semi-closed) 

J J 


a b 

a < x < b 

Bounded open interval 

1 1 

] - oo ,6] 


x < b 

Unbounded interval closed on b (or 
closed right) 


] - oo, 6[ 


x < b 

Unbounded interval open on b (or 
open right) 


[a, +oo[ 


a < x 

Unbounded interval closed on a (or 
closed left) 


}a, +oo[ 


a < x 

Unbounded interval open on a (or 
open left) 


Table 4.11 - Resume of main Combinatorial Analysis cases 

and according to the international norm ISO 80000-2: 2009 (since Switzerland has the art not 
respecting international norms and standards): 

Type Visual Math notation Explicitly 


a b 

E ] 

a < x < b 

Closed bounded interval 


a b 

r r 

a < x < b 

Semi-closed and bounded interval 
on a and semi-open on b (or left 
semi-closed and right semi-open) 

t t 

C a,b } 

a b 

a < x < b 

Semi-open bounded interval on a 
and semi-closed on b (or left semi- 
open and right semi-closed) 

J J 

( a,b ) 

a b 

a < x < b 

Bounded open interval 

J L 




x < b 

Unbounded interval closed on b (or 
closed right) 


(— oo, b[ 


x < b 

Unbounded interval open on b (or 
open right) 


[a, +oo) 


a < x 

Unbounded interval closed on a (or 
closed left) 


(a, +oo) 


a < x 

Unbounded interval open on a (or 
open left) 


Table 4.12 - Resume of main Combinatorial Analysis cases 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


Rl. The notation {x such thata < x < b} denotes the set of real numbers x strictly 
greater than x and strictly less than b. 

R2. To fact that an interval is for example opened on b means that the real number b is 
not part thereof. By cons, if it had been closed then b would be part of it. 

R3. If the variable x can take all possible negative and positive values we write 
therefore: ]— oo, +oo[ where the symbol "oo" means "infinite". Obviously there can be 
combinations of open infinite right intervals with left endpoint and vice versa. 

R4. We will recall some of these concepts with a different approach when studying 
Algebra (literal calculation). 

V / 

We say that the variable x is an "ordered variable" if by representing its domain of definition 
by a horizontal axis where each point on the axis represents a value of x, then for each pair of 
values, we can say that that there is an "antecedent" and one that is a "subsequent". Here the 
notion of antecedent and subsequent is not related to the concept of time it expresses just how 
the values of the variable are ordered. 

Definitions (#36): 

Dl. A variable is said to be "increasing" if each subsequent value is greater than each an- 
tecedent value. 

D2. A variable is said to be "decreasing" if each subsequent value is smaller than each an- 
tecedent value. 

D3. The increasing and decreasing variables are named "variables with monotonic variations" 
or simply "monotonic variables". 

info @ sciences. ch 


Arithmetic Operators 

Talking about numbers like we did in the previous section naturally leads us to consider the 
operations of calculus. It is therefore logic that we make a non-exhau stive description of the 
operations that may exist between the numbers. This will be the goal of this section. 

We will consider in this book that there are two types of key tools in arithmetics (we do not 
speak of algebra but arithmetic!): 

• Arithmetic operators: 

There are two basic operators (addition "+" and subtraction " — from which we can build 
other operators: the "multiplication " (whose contemporary symbol x was introduced in 
1574 by William Oughtred) and the "division" (whose old symbol was but since end 
of the 20th century we use simple the slash symbol). 

These four operators are commonly named "rational operators". We will see them more 
in details after setting the binary relations. 


Rigorously addition could be enough if we consider the common set of real number 
M because therefore the subtraction is only the addition of a negative number. 

V I / 

• Binary operators (relations): 

There are six basic binary relations (equal =, different yC greater than >, less than <, 
greater or equal >, less than or equal <) that compare the order of amplitude of elements 
that are on the left and on the right of these relations (thus at the number of two, hence 
the name "binary") in order to draw some conclusions. The majority of binary relations 
symbols were introduced by Vieta and Harriot in the 16th century.. 

It is obviously essential to know as best a possible these tools and their properties before going 
through into more strenuous calculations. 

3.1 Binary Relations 

Definitions (#37): 

Dl. Consider two non-empty sets E and F (see section Set Theory) not necessarily identical. 
If to some given elements x of A we can associate with a precise mathematical rule R 

EAME v3. 5-2013 

4. Arithmetic 

(unambiguous) one element y of F, we define therefore a "functional relation" that maps 
E to F and that we write: 

R:E^F (3.1) 

Thus, more generally, a functional relation R can be defined as a mathematical rule that 
associates to given components x of E, some given elements y of F. 

So, in this more general context, if xRy, we say that there y is an "image" of x through R 
and that x is a "precedent" or "preimage" of y. 

The set of pairs (x, y) such that xRy is a true statement generates a "graph" or "represen- 
tation" of the relation R. We can represent these couples in a proper chosen way to make 
a graphical representation of the relation R. 

This is a type of relation on which we will come back in the section Functional Analysis 
under the form: R : fix) = yof and that does not interest us directly in this section. 

D2. Consider a non-empty set E, if we associate with this set (and only to this one!) tools to 
compare its items between them when we talk about a "binary relation" or "comparison 
relation" and that we write for any element x and y of A: 

xRy (3.2) 

These relations can also most of time be presented graphically. In the case of conventional 
binary operators comparison where A is the set of natural numbers N, relative Z, rationals 
Q or real M, that is graphically represented by a horizontal line (typically...); in the case 
of congruence (see section Number Theory) it is represented by lines in the plane whose 
points are given by the constraint of congruence. 

3.1.1 Equalities 

It is difficult to define the term "equality" in a general case applicable to any situation. For 
our part, we will allow ourselves for this definition to take the inspiration of the extensionality 
theorem of Set Theory (discussed later in another section). 

Definitions (#38): 

Dl. Two elements are "equal" if and only if they have the same values. The strict equality is 
described by the symbol = that therefore means "equal to" (this symbol was introduced 
in 1557 by Robert Rocorde). 

If we have a = b and c is any given number (or vector/matrix) and * any operation (such 
as addition, subtraction, multiplication or division) then: 

a -k c = b -k c (3.3) 

This property is used to solve or simplify any type of equations. In practice, the abbrevi- 
ation "LHS" is informal shorthand for the left-hand side of an equality. Similarly, "RHS" 
is the right-hand side abbreviation of that latter . 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Obviously we have (property of reflexivity): 

a = b b = a (3.4) 

And also (property of transitivity): 

We will not enumerate the other properties of the equaliy in the section (for more details 
see the section Set Theory). 

D2. If two elements are not strictly equal, that is to say "inequal"..., we are connecting them 
by the symbol ^ and we say they are "not equal". 

If we have a > b or a < b then: 

a 7 ^ b 


There are still other equality symbols, which are an extension of two we have defined previously. 
Unfortunately, they are often misused (we could say rather that they are used in the wrong 
places) in most of the books available on the market (and this book is not an exception): 

1. =: Should be used for congruence but in fact is mostly used to indicate an approxmation. 

2. «: Should be used for approximations but in fact = is used instead. 

3. =: Should be used to say that two elements are equivalent but in practice most people use 

4. :=: Is used to say that one element is by definition equal to another one. 

5. =: Should be used to say "equal by definition to" but in fact most people use instead :=. 

6 . ~: Is used most of time in Statistics to say "follows the law..." but some practitioners use 
instead = or to say "asymptotically equal". 

3.1.2 Comparators 

The comparators are tools that allow us to compare and order any pair of numbers (and also 

The possibility of order numbers is fundamental in mathematics. Otherwise (if it was not pos- 
sible to order), there would be a lot of things that would shock our habits, for example (some of 
the concepts presented in the following sentence have not yet been presented but we would still 
make reference to them): no more monotonic functions (especially sequences) and linked to it 
the derivation would therefore indicate nothing more about the "variation direction", no more 
approach of roots of polynomial by dichotomy (classical research algorithm in an ordered set 
that split in two at each iteration), no more segments in geometry, no more than half space, no 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

more convexity, we can not oriented space anymore, etc. It is therefore important to be able to 
order things as you can see...! 

Thus, for any a, b, c € M. we write when a is greater than or equal to b: 

a ^ b (3.6) 

and when a is less than or equal to b: 

a < b 



It is useful to recall that the set of real numbers M. is a totally ordered group (see section 
Set Theory), otherwise we could not establish order relations among its elements (which 
is not the case for complex numbers C that we can not order!). 

v ! / 

Definition (#39): The symbol < is an "relation order" (see the rigorous definition further be- 
low!) which means "less than or equal to" and conversely the symbol > is also an order relation 
that means "greater than or equal to". 

We also have relatively to the strict comparison the following properties that are relatively intu- 




and vice versa: 

We also have: 

and vice versa: 

< b and 

b < c 

=>- a < c 

> b and 

b > c 

a > c 

> b and 

b = c 

=> a > c 

< b and 

b = c 

=>- a < c 

C b and 

c > 0 

ac < be 

■ < b + c 


a < b => a — c 

■ > b + c 


a > b => a — c 

1 1 

0 < a 

< b 

— > — 

a b 

1 1 

b < a 



— < — 

a b 








info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

We can obviously multiply, divide, add or subtract a term from each side of the relation as it 
is always true. Notice, however, that if you multiply both sides by a negative number it will 
obviously change as the comparator such that: 

a > b and c > 0 =>• ac < be (3.15) 

and vice versa: 

a < b and c < 0 =>• ac > be 


We also have: 

0 < a < b and p e M* + => a p < b p (3.17) 

Consider now that b < a < 0 and p e N* . Then if p is an even integer: 

0 < a p < If (3.18) 

else if p is odd: 

a p >b p 


This result simply comes from the multiplication of signs rule since the power when not frac- 
tional is only a multiplication. 


0 < a < b and nef 

'a < 



The relations 

> < € ^ < 


thus correspond respectively to: (strictly) greater than, (strictly) smaller than, smaller or equal, 
greater or equal, much bigger than, much smaller than. 

These relations can be defined in a little more subtle and rigorous way and apply not only to 
comparators (see for example the congruence relation in the section of Set Theory)! 

Let us see this (the vocabulary that follows is also defined in the section of Set Theory): 

Definition (#40): Given a binary relation R of a set A to itself, a relation R on A is a subset of 
the cartesian product R C A x A (that is to say, the binary relation generates a subset by the 
constraints it imposes on the elements of A satisfying the relation) with the property of being: 

PI. A "reflexive relation" ifVx G A: 



P2. A "symmetrical relation" if Vx, y e A: 

xRy =>■ yRx (3.23) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

P3. An "anti-symmetrical relation" if Vat, y G A: 

( xRy and yRx ) x = y (3.24) 

P4. A "transitive relation" if Vx, y,z E A\ 

(. xRy and yRz) =>■ xRz (3.25) 

P5. An "connex relation" if Vx, y E A: 

Vx,yeA=>xRy or yRx (3.26) 

Mathematicians have given special names to the families of relations satisfying some of these 

Definitions (#41): 

Dl. A relation is named "strict order relation" if and only if it is only transitive (some specify 
then that it is necessarily antireflexive but this last fact is then obvious...). 

D2. A relation is named a "preorder" if and only if it is reflexive and transitive. 

D3. A relation is named an "equivalence relation" if and only if it is reflexive, symmetric, and 

D4. A relation is named "order relation" if and only if it is reflexive, transitive and antisym- 
metric (thus the relations >, < are not order relations because obviously not reflexive 

D5. A relation is named "total order relation" if and only if it is reflexive, transitive, connex 
and antisymmetric. 

For the other combinations it seems (as far as we know) that there are no special name among 
the mathematicians ... 


The binary relations have all similar properties in natural sets N, relative Z, rational Q 
and real M (there is no natural order relation on the set of complex numbers C). 


If we summarize: 

Binary relation = 








































Table 4.13 - 

Binary Relations 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Thus we see that the binary relations <,> form with the previously mentioned sets, total order 
relations and it is very easy to see which binary relations are partial, total or equivalence order 

Definition (#42): If R is an equivalence relation on A. For V.r; e A, the "equivalence class" of 
x is by definition the set: 


{y e A: xRyj 


[x] is therefore a subset of /l (x C A) which we denote also thereafter ... R (so be careful not to 
confuse in what follows the equivalence relation and the subset itself...). 

We thus have a new set that is named the "set of equivalence classes" or "quotient set" denoted 
in this book by A/R. So: 

A/R= {[x]|x G A} 

\ ) 


You should know that in A/R we do not look anymore at [x] as a subset of A, but as an element! 

An relation of equivalence, presented in a popularized manner... thus serves to stick one unique 
label to items that satisfy the same property, and to confuse them with the said label (knowing 
what we do with this label). 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


In the set of integers Z, if we study the remains of the division of number by 2, 
we have that the result is always 0 or 1. 

The zero equivalence class is then named the "set of even integers numbers", the one 
equivalence class is therefore named the "set of odd integers". So we have two classes 
of equivalence for two partitions of Z (always keep in mind this simple example for 
theoretical elements that follow it helps a lot!). 

If we name the first 0 and the second 1, we fall back on the operation rules between odd 
and even numbers: 

0 + 0 = 0 0 + 1 = 0 1 + 1 = 0 (3.29) 

which respectively means that the sum of two even integers is even, that the sum of an 

even and an odd integer is odd and that the sum of two odd integer is even. 

And for the multiplication: 

0x0 = 0 0x1 = 0 1x1 = 1 (3.30) 

which respectively means that the two product of two even integer is even, the product 

of an even and an odd integer is even and that the product of two odd integer is odd. 

Now, to verify that we are dealing with an equivalence relation, we should still check that 
it is reflexive ( xRx ), symmetrical (if xRy then yRx) and transitive (if xRy and yRz then 
xRz). We will see how to check it a few paragraphs further below because this example 
is a very special case of congruence relation. 

Definition (#43): The application / : A H> A/R defined by x [x] is named "canonical 
projection". Any element x G [x] is therefore named "class representative" of [x] . 

Theorem 4.9. Now consider a set E. Then we propose to proved that there is correspondence 
between the set of equivalence relations on E and all partitions of E. In other words, this 
theorem says that an equivalence relation on E is nothing more but a partition on E (this is 

Proof 4.9.1. Let R be an equivalence relation on E. We choose / = E / R as set partition 
indexing and all we ask for any [x] G E/ R, /+ T ] = [x] . 

We just have to check the following two properties of the definition of partitions to show that 
the family (E^) is a partition of E: 

PI. Given [x], [y\ G E/R such that [x] ^ [y\ then (obvious) E^ n £+] = 0. 
P2. E — |J is obvious because if x G E then x G [x] = £++ 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

□ Q.E.D. 

Again, it should by easy to check with the practical example of the division by 2 given previ- 
ously that the partition of even and odd numbers satisfies these two properties (if not reader can 
contact us we will add this as an example). 

We have therefore associated to the equivalence relation R a partition E. Conversely, if ( E , ) r 
is a partition of E then we almost easily verify that the relation R is defined by xRy if and only 
if there exists j 6 / such as x,y G E, is an equivalence relation! Both applications are thus 
bijective and the inverses of each other. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


We will now apply an example a little less trivial than the last we have seen to 
the construction of rings Z/Z after a few reminders equation (for the concept of ring see 
the section Set Theory). 


1. Given two numbers n, rn e Z. We say that "n divides m" and we write n\m if 
and only if there exists an integer k e Z such as rn = kn (see section Numbers 

2. Given d > 1 is an integer. We define the relation R by nRm if and only if d\ (n — rn) 
or in other words nRm if and only if there exists d G Z such that n = m + kd. 
Usually we write this n = m (modulo d) instead of nRm and we say that "n is 
congruent to m modulo d". Remember also that n = 0 (modulo d) if and only if d 
divides n (see section Numbers Theory). 

We will now introduce an equivalence relation on Z. Let us prove that for any integer 
d > 1, the congruence modulo d is an equivalence relation on Z (we have already proved 
this in the section of Number Theory in our study of congruence but let us redo this work 
for the fun...). 

To prove this we simply have to control the three properties of the equivalence relation: 

PI. Reflexivity: n = n since n = n + 0 d. 

P2. Symmetry: If n = m then n — m + kd and therefore m = n + {—k)d that is to say 
m = n. 

P3. Transitivity: If n = m and then mj then n — m + kd and m — j + k'd therefore 
n = j + (k + k')d that is to say n = j. 

In the above situation, we denote by Z/riZ the set of equivalence classes and we will 
deonte by [n]d the equivalence class of congruence of a given integer n given by: 

[n]d = {..., n — 2 d, n — d,n,n + d, n + 2 d, n + 3 d, . . .} (3.31) 

(each difference of two values in the braces is divisible by d and this is therefore an 
equivalence class), thus: 

Z/dZ = { [0] d , [l] d , [2] d , . . . , [d - 1]4 (3.32) 

In particular (trivial since we obtain thus the all Z): 

Z/2Z = { [0] 2 , [1] 2 } (3.33) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


The operations of addition and multiplication on Z define also the operation of addition 
and multiplication on Z/dZ. Then we say that these operations are compatible with the 
equivalence relation and then form a ring (see section Set Theory). 

V / 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

3.2 Fundamental Arithmetic Laws 

As we have said before, there is a fundamental operator (addition) from which we can define 
multiplication, subtraction (provided that the chosen Numbers Set is adapted to it....) and divi- 
sion (provided that the chosen Numbers Set is also adapted to it....) and around which we can 
build the entire Analytical Mathematics. 

Obviously there are some subtleties to be considered when the level of rigour increase. The 
reader can then refer to the section of Set Theory where fundamental laws are redefined more 
accurately than what will follow. 

3.2.1 Addition 

Definition (#44): The addition of integers is an operation denoted "+" which has for only 
purpose to bring together in one number all the units contained in several others. The result of 
the operation is named the "sum", the "total" or "cumul". The numbers to be added are named 
therefore "terms of the addition". 

Thus, A + B + C... are the terms of the addition and the result is the sum of the terms of the 

Or in schematic form of a special case: 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

4 +3 

< — l — i — i — i — i — i — i — i — i — i — i — i — i ► 

0 12 34 5 678 9 10 11 12 

Figure 4.19 - One possible schema for addition 

Here is a list of some intuitive properties that we assume without proofs (as in fact they are 
axioms) of the operation of addition: 

PI. The sum of several numbers do not depend on the order of terms. Then we say that the 
addition is a "commutative operation". This means concretely for any two numbers: 

A + B = B + A (3.34) 

P2. The sum of several numbers does not change if we replace two or more of them by their 
intermediate result. Then we say that the addition is an "associative operation": 

(A + B) + C = A + (B + C) (3.35) 

P3. The Zero is the neutral element of addition because any number added to zero gives that 

A + 0 = 0 (3.36) 

P4. Depending on the set in which we work (Z, Q, M, ...), the addition may include a term in 
such a way that a sum is zero. Then we say that there exists an "opposite" to the sum such 

A + A = 0 


We have define more rigorously the addition using the Peano axioms in the particular case of 
all natural numbers N as we have already see in the section Numbers. So, with these axioms it 
is possible to prove that there exists one and only one application (uniqueness), denoted "+" of 
N x N in N satisfying: 

Vn 6 N : n + 0 = n 

Vp G N : Vg G N, p + s(q) = s(p + q) (3.38) 

Vn G N, s(n) — n + 1 

where S means "successor". 


As this book has not be written for mathematicians, we will pass the proof (relatively 
long and of little interest in the case of business) and we will assume that the application 
"+" exists and is unique ... and that it follows from the above properties. 

V / 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Let xi, x-2- .... x n be any numbers then we can write the sum as following: 


Xi + x 2 + • • • + x n = Xi 


by defining upper and lower bound to the indexed sum (below and above the upercase greek 
symbol Sigma). 

Here are some properties relatively to this condenses notation that should be obvious (if not the 
reader can send us a request we will add the details): 

n n n n n n 

J2 kx i= kJ2xi X k = nk X( X * + Vi) = X x i + X Vi (3-40) 
2=1 2=1 2=1 2=1 2=1 2=1 

where A; is a constant. 

Let us see now some concrete examples of additions of various simple number in the purpose 
to practice the basis: 


The addition of two numbers relatively small is quite easy since we have learn by 

heart to count to a number resulting of the operation. 

Therefore (examples taken on 

decimal basis): 


+ 2 





+ 3 





+ 3 



For more bigger number we can 

adopt another method that human must also learn by 

heart. For example: 

+ 3475 


The algorithm (process) is therefore the following: We add the columns (4 columns in 

this example) from right to left. 

For the first column we have therefore 4 + 5 = 9 this 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


+ 34 75 (3.45) 


and we continue like this for the second column where we have 4 + 7 = 11 at the 
difference that now we have a number > 10, then we report the first left digit on the next 
(left) column for the addition. Therefore: 

9 2 +1 44 

+ 3 4 75 (3.46) 


The third column we be calculated therefore asl + 2 + 4 = 7 which give us: 

9 2 +1 44 

+ 3 4 75 (3.47) 

7 19 

For the last column we have 9 + 3 = 12 and once again we report the first digit from the 
left on the next column of the addition. Therefore: 

+1 9 2 +1 44 

+ 3 4 75 (3.48) 

2 7 19 


+1 9 2 +1 44 

+ 3 4 75 (3.49) 

12 7 19 

This example show how we can proceed for the addition of any real numbers: we do an addition 
column by column from the right to the left and if the result of one addition is greater than 10, 
we report the left digit on the next (left) column. 

This algorithm (process or methodology) of addition is quite simple to understand and to exe- 
cute. We will not go further on this subject add this day. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

3.2.2 Subtraction 

Definition (#45): Subtraction is a mathematical operation that represents the operation of re- 
moving objects from a collection. More formally the subtraction of the number A by the number 
B denoted by the symbol " — " consist in founding the number C such that added to B gives A. 

Remark ^ 

1 As we saw it in the section of Set Theory the subtraction in the set N could be possible 1 

^ only if A > B. 


Formally we write an inline literal subtraction in the form: 

A — B = C 


That must satisfies: 

A = B + C 


Or in schematic form of a special case: 

10-3-4 = 7-4 = 3 

-4 -3 

/ Y i 

•«— i — i — i — i — i — i — i — i — i — i — i — i — i — ► 

0 12 34 5 678 9 10 11 12 

Figure 4.20 - One possible schema for subtraction 

Here are some intuitive properties that we assume without proof for the subtraction operation 

(as it can be deduce from the addition...): 

PI. The subtraction of several numbers depends on the order of the terms. We say when than 

subtraction is a "non-commutative operation". Indeed: 

5 — 2 ^2 — 5 


P2. The subtraction of several numbers change if we replace two or more of them by their 

intermediate result. We say when the subtraction is a "non-associative operation". Indeed: 

5 — (3 — 2) ^ (5 — 3) — 2 


P3. The zero is not the neutral element of subtraction. Indeed, any 

number to which we 

subtract zero gives the same number, so zero is neutral on the right 

.. but not left because 

any number we subtract to zero does not give zero! We then say 
"neutral on the right" in the case of subtraction. Indeed: 

that the zero is only 



info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

In most complicated cases we have a special vocabulary: 


7 0 4 
5 12 
19 2 


Rest or Difference 


The "minuend" is 704, the "subtrahend" is 512. The minuend digits are m 3 = 7, m 2 = 0 and 
mi = 4. The subtrahend digits are s 3 = 5, s 2 = 1 and s i = 2. Beginning at the one’s place, 4 is 
not less than 2 so the difference 2 is written down in the result’s one place. In the ten’s place, 0 
is less than 1, so the 0 is increased by 10, and the difference with 1, which is 9, is written down 
in the ten’s place. The American method corrects for the increase of ten by reducing the digit in 
the minuend’s hundreds place by one. That is, the 7 is struck through and replaced by a 6. The 
subtraction then proceeds in the hundreds place, where 6 is not less than 5, so the difference is 
written down in the result’s hundred’s place. We are now done, the result is 192. 

Let us see now some concrete examples of additions of various simple number in the purpose 
to practice the basis: 


The subtraction of two relatively small numbers is pretty easy once 
count to at least the number resulting from this operation. So: 


we memorized to 

- 2 





- 3 








For larger numbers another possible method must be learned by heart (as well as for the 
addition). For example: 


- 3785 




info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

we subtract the columns (4 columns in this example) from right to left. In the first column 
we have 4 — 5 = — IcOsowe report —1 to the next column (second one) and we write 
10 — 1 = 9 below the horizontal line of the first column: 

45 7” 1 4 

- 37 8 5 (3.60) 


and we continue as well for the second column 7 — 8 = — 1 < 0 so that we report — 1 
on the next column (third one) and as —1 — 1 = —2 we report 10 — 2 = 8 below the 
horizontal bar of the second column: 

4 5 -1 7 _1 4 

- 3 7 8 5 (3.61) 

8 9 

The third column is calculated as 5 — 7 = —2 < 0 and we report —1 on the next column 
(fourth one) and as —1 — 2 = —3 we report 10 — 3 = 7 below the line of the third column 

- 3 7 8 5 (3.62) 

7 8 9 

In the last column we have 4 — 3 = 1 > 0 therefore we report the nothing on the next 
column and as 1 — 1 = 0 we report 0 below the line of the fourth column bar: 

- 3 7 8 5 (3.63) 

0 7 8 9 

That’s how we therefore we proceed to subtracting any numbers. We make a subtraction by 
column from the right to the left and if the result is a subtraction is less than zero we report — 1 
to the next column and the addition of the latest report on the subtraction obtained below the 

We have when we mix the addition and subtraction the following resulting relation that should 
be obvious for most readers: 

a + (b — c) = (a + b) — c 
a- (b + c) = (a-b) -c 

n \ ( n , (3.64) 

a — [b — c) = (a — b) + c 

a — b = (a — c) — (b — c) 

The methodology used for subtraction being based on exactly the same rules that for addition 
we will expand the subject more as this seems actually useless in our point of view. This method 
is very simple and of course requires some habits to work with numbers to be fully understood 
and mastered. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

3.2.3 Multiplication 

Definition (#46): The multiplication of numbers is an operation that has for purpose, given two 
numbers, one named "multiplier" m, and the other "multiplicand" M, to find a third number 
named "product" P that is the sum (multiplication is only a successive number of sums!) as 
many equal numbers to the multiplicand as there are units multiplier: 


m x M = M + M + M +... + M = Y M = P (3.65) 

(1) (2) (1) (m) ^ 

The multiplicand and multiplier are named "product factors". 

The multiplication is indicated in kindergarten by the symbol " x " of of the elevated dot symbol 
in higher classes or even when there is no possible confusion... without anything: 

axb = a- b = ab 

\ / 


We can define the multiplication using the Peano axioms in the special case of natural numbers 
N as we have already mentioned in the sectionNumbers. Thus, with these axioms it is possible 
to prove that there is (exists) one and only one (unique) application, denoted " x " or more often 
of of N 2 to N satisfying: 

Vn G N, n ■ 0 = 0 

' (3 67) 

Vp G N, Vg G N, p(q + 1) = pq + q 


As this book has not be written for mathematicians, we will pass the proof (relatively 
long and of little interest in the case of business) and we will assume that the application 
"x" exists and is unique ... and that it follows from the above properties. 

\ / 

The power is a specific notations of a special case of the multiplication. When to multipli- 
cand^) and the multiplier(s) are typically identical in numerical values, we denote therefore the 
multiplication by (for example): 

n ■ n ■ n ■ n ■ n ■ n ■ n ■ n = n 8 (3.68) 

This is what we name the "power notation" or "exponentations". The number in superscript is 
what we the name the "power" or the "exponant" of the number. The notation with exponants 
is said to be see for the first time in a book of Chuquet in 1484. 

You can check by yourself that is properties are the following (for example): 

n x n y = r f+y (3.69) 

and also: 

a x b x = ( ab) x (3.70) 

Here are some obvious properties about the multiplication that we will admit without proof (this 
is a Set properties point of view listing): 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

PI. The multiplication of several numbers does not depend on the order of terms. Then we 
say that multiplication is a "commutative operation". 

P2. The multiplication of several numbers does not change if we replace two or more of 
them by their intermediate result. We then say that the multiplication is an "associative 

P3. The unit is the neutral element of the multiplication as any multiplicand multiplied by the 
multiplier 1 is equal to the multiplicand itself. 

P4. The multiplication may have a term such that the product is equal to unity (the neutral 
element). Then we say that there exists a "multiplicative inverse" (but this depends strictly 
speaking in what set of numbers we work as in some the concept of decimal number does 
not exist!). 

P5. Multiplication is a "distributive operation", that is to say: 

a ■ {b + c) = ab + ac (3.71) 

the reverse being named a "factorization operation". 

Let us also introduce some special notations for the multiplication: 

1. Given any numbers aq, x 2 , ..., x n (not necessarily equal) then we can write the product as 

xi ■ x 2 • . . . • x n = n Xi 

i = 1 


by defining upper and lower bounds to the indexed product (above and below the upper- 
case Greek letter "Pi"). 

We trivially have respectively to the latter notation (on request we can detail more...): 

for any number k such that: 

We also have for example: 

Y[ kxi = k n Xi 
2=1 2—1 

n k=k n 


n(:r + 2 /) = (: x + y) n 




2. We define the "factorial" simply ("simply"... because it exists also a more complex way 
of defining it through the Euler Gamma function as it is done in the section of Integral 
and Differential Calculus) by: 

r > 

Ix2x3x4x---xn = n! 

c J 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

with the special fact that (only the complex definition mentioned before can make this 
fact obvious...): 

0! = 1 (3.77) 

Let us see some simple examples of basic multiplications: 


El. The multiplication of two relatively small numbers is fairly easy once we 
have memorized count to at least the number resulting from this operation. So: 

5 10 1014 

x 2 x 3 x 3 (3.78) 

TO 30 3042 

E2. For much larger numbers we must adopt another method that has to be memorized. 
For example: 

x 8 


This methodology is very logical if you understand how we build a a number in base ten. 
Thus we have (we’ll assume that the distributive property is mastered): 

8 x 4574 = 8 x (4 • 10 3 + 5 • 10 2 + 7 • 10 1 + 4 • 10°) 

= 8 x 4000 + 8 x 500 + 8x 70 + 8x4 

„ , n (3-80) 

= 32 • 10 3 + 40 • 10 2 + 56 • 10 1 + 32-10° 

= 36592 

To avoid overloading the notations in the multiplication by the "vertical" method, we do 
not represent the zeros that would overload unnecessarily the calculations (and even more 
if the multiplier and / or the multiplicand are very large numbers). 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

3.2.4 Division 

Definition (#47): The division of integers (to start with the simplest case ...) is an operation, 
which aims, given two integers, one named "dividend" D, the other named "divider" d , to find 
a third number named "quotient" Q which is the largest number whose product by the divisor 
can be subtracted (so the division result of the subtraction!) the dividend (the difference being 
named the "rest" R or sometimes the "congruence"). 


In the case of real numbers there are never any rest at the end of the division operation 
(because the quotient multiplied by the divisor gives always exactly the dividend)! 

V / 

Generally in the context of integers (or algebraic equation division), if we denote by D the 
dividend and by d the divisor, the quotient Q and the remainder R we have the relation: 

D = Q ■ d + R 


knowing that the division was initially written as follows: 

D:d=^~ (3.82) 


We indicate the operation of division by placing between the two numbers, the dividend and the 
divider, a symbol or a slash "/" or even in kindergarten with the symbol 

We refer also often by the term "fraction" (instead of "quotient"), the ratio of two numbers or in 
other words, the division of the first by the second. 


The sign of division is said to be due to Gottfried Wilhelm Leibniz. The slash symbol 
could have been see for the first time in the works of Leonardo Fibonacci (1202) and is 
probably due to the Hindus. 


If we divide two numbers and we want an integer as quotient and as remainder (if there is one...), 
then we speak of "euclidiean division". 

For example, dividing a cake, is not a Euclidean division because the quotient is not an integer, 
except if one takes the four quarters ...: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

a cake divided 
in four parts... 

in gray three quarts 
of a cake... 

Figure 4.21 - Schematic example of a division (fractions) 

If we have: 

D : d = D ■ — - — D ■ in (3.83) 


we name i D the inverse of the dividend. At any number is associated an inverse that satisfies 
this condition. From this definition it comes the notation (with x being any number other than 

- = x 1 ■ x 1 = x 1 1 = x° = 1 (3.84) 


In the case of two fractional numbers, we say they are "inverse" or "reciprocal", when their 
product is equal to unity (as the previous relations). 


Rl. A division by zero is what we name a "singularity". That is to say the result of the 
division is: undetermined!! 

R2. When we multiply the dividend and the divisor of a division (fraction) by a 
same number, the quotient does not change (this is an: "equivalent fraction"), but the 
remainder is multiplied by that number. 

R3. Divide a number by a product made of several factors is equivalent to divide this 
number successively by each of the factors of the product and vice versa. 

R4. Fractions that are greater than 0 but less than 1 are named "proper fractions". In 
proper fractions, the numerator is less than the denominator. When a fraction has a 
numerator that is greater than or equal to the denominator, the fraction is an "improper 
fraction". An improper fraction is always 1 or greater than 1. And, finally, a mixed 
number is a combination of a whole number and a proper fraction. 

V / 

The properties of the divisions with the condensed power notations (exponentiation) are typi- 
cally as example (we will leave to the reader the fact to check this up to with numerical values): 

X ■ X ■ X 


= x 3 ■ y 2 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

or obviously another example: 

nr* . nr* . nr* nr* nr* 

iX_/ a/ «X/ 4/ o o -| 

= — ■ — •x = l- l- x = or-a; =x = x 

a; • x x x 

We therefore deduce that: 





Let us recall that a prime number (relative integer Z) is a number greater than 1 that has for 
divisors only itself and unity (remember that 2 is prime for example). Therefore any number 
that is not prime has at least one prime number as a divisor (except 1 by definition!). The 
smallest divisors of an integer is a prime number (we will detail the properties of prime numbers 
relatively to the operation of division in the section Numbers Theory). 

Let us see some properties of the division (some of us are already known because they arise 
from logical reasoning of the multiplication properties): 

a c 

— — - ^ a ■ a — b ■ c 
b d 




^b = ~Y = ^b 

a c a ■ d + b ■ c 

b + d = ~ 
a c a 

b~~ = ~ 


d a ■ d 


b ■ c 

a ■ d ad a a 

b-d b d b b 











a ■ c 

a h 

-b = a 


- = - -v^ ad = be 
b d 

is what we we name a "terms amplification" and: 

a c a T c 
b + b = b 

is an operation consisting by putting everything with a "common denominator". 




We also have the following properties: 

PI. The division of several numbers depend on the order of terms. We then say that the 
division is a "non-commutative operation". This means we have when a that is different 
from b and that both are different from zero: 






P2. The result of the division of several numbers change if we replace two or more of them by 
their intermediate result. We then say that the division is a "non-associative operation": 



info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

P3. The unit is the neutral element has that we multiply the divident or the divider by 1 the 
result of the division remains the same. 

a 1 • a a a 

b ~ ~T~ ~ lT b ~ ' b 


P4. The division may include a divider in such a way that the division is equal to unity (neutral 
element 1). We then say that there exist a "symmetrical to the division" that is obviously 
equal to the numerator (dividend) itself. 

P5. The incrementation of numerator and denominator by a constant value is not equal to the 
initial ratio in the general case where a ^ b: 



a + c te 
b + c te 


Now that we know the multiplication (and therefore power notation) and division, if we consider 
a and b are two positive real numbers, different from zero we have: 

p-times g-times 

and (named sometimes the "zero exponent rule of exponents"): 

a ■ a . . . ■ a 



a q a ■ a . . . ■ a 

= a ■ a . . . • a = aP q 

a~ q = - 


p— g-times 


a q 


We have also obviously: 

a ■ a . . . ■ a 

a n a ■ a . . . ■ a 

= a- a...-a = a u = 1 

n— n-times 







( n\ , m 

[a = a • a . . . • a - . . .a • a . . . • a 

a • a . . . • a = a rnn 

S. v ^ 

ran- times 

ra -times 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

3.2.4. 1 n-root 

Now that we have introduce in a simple and not too much formal way the operations of 
multiplication (and power notation) and division we can introduction the concept of //-root. 

As we know for example that: 

2 3 2 2 = 2 3+2 

we can by reverse inference for example also write: 

2i __ 20-5+0.5 _ 2 O. 520 . 5 _ 2 1 l 2 2 l l 2 


and therefore it means that fractional power exist! This is what we name n-root (in the above 
example we speak of 2-root). 

We can now define the principal n-root of any number! 

Definition (#48): In mathematics, the nth root of a number a, where n is a positive integer, is a 
number r which, when raised to the power n yields x. That is to say such that: r n = x, where 
n is the degree of the root. By convention we write: 

r = x 1/n : = 


Roots are usually written using the "radical" symbol ^/TTT or also named the "radix". The 
number n G N is named the "radicand" and sometimes the "index". From what has been said 
for the powers, we can easily conclude that the n-th root of a product of several factors is the 
product of n-th roots of each factor: 

■ y/b = y/a-b 



as (seen previously): 

(aPY = aP' q = ( a q Y 


And therefore: 

a i = pfaP = {p/a) p and Pfpfa = v pfa 


Obviously it comes: 



We also have if a < 0: 


if n G N* is odd and: 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

if n G N* is even. 

If x < 0 and n € N* is odd then: 

is the number y such that: 

y = a 1 /" = ^ 

b n = x 



If n G N* is even then obviously, as we already have seen it earlier, the root belong to C (see 
section Numbers). 

If the denominator of a fraction contains a factor of the form v a k with a ^ 0. by multiplying 
the numerator and denominator by \J a n ~, we will remove the root of the denominator, since: 


i k \Ja n ~ k 

x \J a n ~ k 
\J a k a n ~ k 

x \J a n ~ k x \J a n ~ k x \J a n ~ k 

<y a k+n ~ k \/a™ |a| 



Let us see a world famous example of the application of the root about the origin 
of the ISO paper formats: A6, A5, A4, A3, A2, Al, A0, etc. 

This format of paper has in fact the property (there is a goal at the origin!) to keep the 
proportions when we bend or cut the sheet in half in its largest dimension. Thus, if we 
denote by L the length and W the width of the sheet, we have: 

L W , l2 

w = -l^ l ~ = 21 



Hence we have: 

L = \f2W 


As the A0 format by definition has an area of 1 [m 2 ] . For this format we have then: 

LW = 1 [m 2 ] 


Therefore we deduce that: 

LW = V2\ H 2 = 4= = 1 K] 


and therefore: 

W = 2' 1/4 • 1 [m] = 84.1 [cm] 


from whence we derive: 

L = 2~ 1/4 ■ 1 [m\ = 118.9 [cm] 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

3.3 Arithmetic Polynomials 

Definition (#49): An "arithmetic polynomial" (not to be confused with "algebraic polynomial" 
that will be studied later in the section Algebra) is a set of numbers separated from each other 
by the operators of addition or subtraction (+ or — ) including therefore the multiplication... 

The components enclosed in the polynomial are known as "terms" of the polynomial. When 
the polynomial contains a single term, then we speak of "monomial", if there are two terms we 
speak of "binomial", and so on... 

Theorem 4.10. The value of an arithmetic polynomial is equal to the excess of the sum of the 
terms preceded by the + sign on the sum of the terms preceded by the sign — . 

Proof 4.10.1. 

77-1 — 77-2 + 77 3 - 77 4 + 77 5 - 77 6 + . . . - 77j_i + 77 j = 

(77i + ?7 3 + 77 5 + . . . + 77 j) 

+ ( 1) (t7 2 + ?7 4 + 77 6 + . . . + 77j_i) 


whatever the values of the terms. 

□ Q.E.D. 

Highlight the negative unit —1 is what we name, as we already know, a "factorization". The 
reverse operation is named as we also already know a "distribution" or "development". 

The product of several polynomials can always be replaced by a single polynomial that we name 
the... "resulting product". We usually operate as follows: we multiply successively all the terms 
of the first polynomial, starting from the left, with the first, the second, ..., the last by the second 
polynomial. We obtain a first partial product. We do, if necessary, a reduction (simplification) 
of similar terms. We then multiply each of the terms of the partial product successively by the 
first, the second, ..., the last term of the third polynomial starting from the left and so on. 


Pi ■ P 2 ■ P 3 

(a + b + c)(d + e + f)(g + h + i) 

a(d + e + f)(g + h + /) + b(d + e + f)(g + h + i) 

c(d + e + f)(g + h + i) 

ad(g + h + i) + ae(g + h + i) + af(g + h + i) + bd(g + h + i) 
be(g + h + i) + bf(g + h + i) + cd(g + h + i) + ce(g + h + i) 
cf{g + h + i ) 

adg + adh + adi + aeg + aeh + aei + afg + afh + afi + bdg + bdh 
bdi + beg + bell + bei + bfg + bfh + bfi + cdg + cdh + cdi + ceg + cell 
cei + cfg + cfh + cfi 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

The product of the polynomials P\, A, A, ..., Pk, ... is the sum of all products of n r factors 
formed with a term of A, of a term of A, .... and a term of Pk and so. if there is no reduction, 
the number of terms is equal to the product of the numbers of terms of each polynomial such 
that the final number of therms is equals to: 


n = \[n i (3.119) 


3.4 Absolute Value 

Definition (#50): In mathematics, the "absolute value" |a;| of a real number x is the non-negative 
value of x without regard to its sign. Namely, \x\ = x for a positive X 9 1 3 / 1 — X for a negative 
x (in which case — x is positive), and |0| =0. For example, the absolute value of 3 is 3, and the 
absolute value of —3 is also 3. The absolute value of a number may be thought of as its distance 
from zero. 


Rl. The term absolute value has been used in this sense from at least 1806. The notation 
\x\, with a vertical bar on each side, was introduced by Karl Weierstrass in 1841. 

R2. For plots about the absolute value the reader is referred to the Functional Analysis 
section of this book. 

V / 

For any real number x, the "absolute value" x, is formally given by 

f + x if x > 0 
— x if x < 0 

0 if x = 0 

At the origin the absolute value was defined as: 

\x\ = VxP 

We notice that also the following possible notation: 

|x| = max(-i, x) 

And the equivalent expressions: 

x ^ |x| 

| — x\ = |x| 

and also: 

\x\ ^ y — y ^ x ^ y (3.125) 

\x\ ^y<^x^—y\/x^y (3.126) 







info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

the latter being often used in the context of solving inequalities. 


Solving an inequality such that: 

\x — 3| < 9 


is then solved simply by using the intuitive concept of distance. The solution is the set of 
real numbers whose distance from the real number 3 is less than or equal to 9. This is the 
range of center 3 and radius 9 or formally: 

[3 - 9, 3 + 9] = [-6, 12] 


Let us indicate that it is also useful to interpret the term: 

x-y | = \f{x - y) 2 


as the (euclidean!) distance between the two numbers x and y on the real line. Thus, by 
providing the set of real numbers of the absolute value distance , it becomes a metric space (see 
the section of Topology to have a robust introduction to what is a distance) ! ! ! 

The absolute value has some trivial properties that we will give without proof (excepted on 
reader request) as they seem to us quite intuitive: 

The absolute value has the following four fundamental properties: 

PI. Non-negativity: 

x\ > 0 


P2. Positive-definiteness: 


x = 0 


P3. Multiplicativeness: 

xy | = \x\\y\ 

P4. Subadditivity ("first" triangle inequality): 

\x + y\ < |x| + \y\ 



Other important properties of the absolute value include: 

P5. Idempotence (the absolute value of the absolute value is the absolute value): 

l(M)l = \x 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

P6. Evenness (reflection symmetry of the graph): 

I — x\ = lx 

P7. Preservation of division (equivalent to multiplicativeness) if y ^ 0: 

x \x\ 

y \y\ 

P8. Reverse ("second") triangle inequality (equivalent to subadditivity): 

\x-y\> |(|x| -y |)| 





info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

3.5 Calculation Rules (operators priorities) 

Frequently in computing (in development in particular), we speak of "operators precedence". 
In mathematics we speak of "priority of the sets of operations and rules of signs". What is this 

We have already seen what are the properties of addition, subtraction, multiplication, division 
and power. We therefore insist that the reader distinguishes the concept of "property" of this of 
"priority" (that we will immediately see) which are (obviously) two completely different things! 

In mathematics, in particular, we first define the priorities of the symbols {[()]}: 

1. Operations that are in brackets () should be performed first in the polynomial. 

2. Operations that are in brackets [ ] should be made afterwards from the results of operations 
that were in brackets (). 

3. Finally, from the intermediate results of operations that were in () and brackets [], we 
calculate the operations that are between the braces {}. 

Let us do an example, this will be more telling. 


Consider the calculation of the polynomial: 

{[5 • (8 + 2) + 3 • [4 + (8 + 6) • 2]] • (1 + 9)} • 7 + 1 (3.138) 

According to the rules we defined earlier, we first calculate all the elements that are in 
parenthesis (), that is to say: 

8 + 2 = 10 (8 + 6) = 14 (1 + 9) = 10 (3.139) 

Which give us: 

{[5 • 10 + 3 • [4 + 14 • 2]] • 10} • 7 + 1 (3.140) 

Always according to the rules we defined earlier, now we calculate all the elements 
between brackets by always starting to calculate the terms that are in brackets [ ] at the 
lowest level of the other brackets []. Thus, we first calculate the expression [4 + 14 ■ 2] 
that is in the top-level bracket: [5 • 10 + 3 • ...]. 

This give us [4 + 14 • 2] = 32 and therefore: 

{[5 -10 + 3 -32] -10} -7 + 1 (3.141) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

It remains to us to calculate now [5 ■ 10 + 3 ■ 32] = 146 and therefore: 

{146 -10} -7+1 

We now calculate the single term in braces, which gives us: 

{146 ■ 10} = 1460 

Finally it remains: 

1460 • 7 + 1 = 10221 




Obviously this is a special case ... But the idea remains the same in general. 

The priority of arithmetic operators is a problem mainly related to computer languages (as we 
have already mentioned) because we can only write mathematical relation on a single line and 
this is many times as source of confusion for people not having technical skill. 

— a ■ (b + c) d 

f 9 

e J 

will be written (pretty much on most computer languages): 

—a * (b + c) A d/e A f — g 
A non initiated could read this in many ways: 

— a ■ (b + c) d 

(b + c) 

~ 9 \ 

(b + c)] c 

- a ; 



[-a - ( b + c)]ef - g 


Thus it has logically be defined an order of prioritization of operators such that the operations 
are carried out in the following order: 

1 . — Negation 

2. " Power 

3. * Multiplication and / division 

4. \ Integer division (specific to computer science) 

5. mod Module (see section Number Theory) 

6. +, — Addition and subtraction 

Obviously the rules of parentheses (), brackets [], and braces {} that were defined in mathemat- 
ics apply also to computing. 

Thus we get in the order (we replace every transaction made with a symbol): 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

First the terms in parentheses: 

—a*(b + c) A d/e A f — g = —a * a A d/e A f — g 


1. First the negation (rule 1): 

—a*a A d/e A f — g = /3*a A d/e A f — g (3.149) 

2. The power (rule 2): 

/3 *a A d/e A f - g = ft *x/8 - g (3.150) 

3. We apply the multiplication (rule 3): 

P*x/$ ~ 9 = e/<5 - g (3.151) 

4. And we apply division (rule 3 again): 

£ /§-g = (t)-g (3.152) 

The rules (4) and (5) does not apply to this particular example. 

5. And Finally (rule 6): 

'<r-g = v (3.153) 

Thus, following these rules, neither a computer nor ahuman can (should) be wrong in interpret- 
ing an equation written on a single line. 

In computer code, however, there are several operators that we do not always find in pure 
mathematics and which order property frequently change depending from a computer language 
to another. We will not dwell too much on that stuff as it is almost without end, however, we 
have below a small description: 

• The concatenation operator "&" is evaluated before comparisons operators. 

• Comparison operators (= all have equal priority. 

However, the leftmost operator in an expression, hold a higher priority. 

The logical operators are evaluated in the following order of priority in most computing lan- 

1. Not (A) 

2. And (A) 

3. Or (V) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

4. Xor (©) 

5. Eqv («=>) 

6. Imp (=^) 

Now that we have seen the operator priorities, what are the rules about signs applicable in 
mathematics and computing science? 

First, you must know that these latter rules only apply in the case of multiplication and division. 
Given two positive numbers (+x), (+y). We have: 

(+x) • (+y) = (+x) • (+?/) = + (x • y) (3.154) 

In other words, the multiplication of two positive numbers is a positive number and this can be 
generalized to the multiplication of n positive numbers. 

We have: 

(~x) ■ {+y) = (+x) ■ (-y) = -(x ■ y) (3.155) 

In other words, the multiplication of a positive number to a negative number is negative. Which 
can be generalized: to a positive result of a multiplication if there is an even number of negative 
numbers, and a negative result if there is an odd number of negative numbers on all n numbers 
included in the multiplication. 

We have: 

(-x) • (-y) = (lx) • (-y) = +(x • y) (3.156) 

In other words, multiplying two negative numbers is positive. What can be generalized: to 
a positive result of the multiplication if there is an even number of negative numbers and a 
negative result if there is an odd number of negative numbers. 

About divisions, the reasoning is the same: 


(■ +y ) 


(■ +y ) 



In other words, if the numerator and denominator are positive, then the result of the division 
will be positive. 

We have: 

(+x) _ (-x) 

(- y ) ( +y ) 


(■ -y ) 





In other words, if either the numerator or denominator is negative, then the result of the division 
will be necessarily negative. 

We have: 








info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

In other words, if the numerator and denominator are positive, then the result of the division, 
will necessarily be positive. 

Obviously if we have a subtraction of terms, it is possible to rewrite it in the form: 

x — y = x + (— 1 )y = —1 • (—x + y) (3.160) 

info @ sciences. ch 


Number Theory 

T Raditionally, number theory is a branch of mathematics that deals with properties of 
integers, whether natural or whole integers. More generally, the field of study of 
this theory concerns a broad class of problems that naturally come from the study 
of integers. Number theory can be divided into several branches of study (algebraic 
number theory, computational number theory, etc.) depending on the methods used and the 
issues addressed. 


The sign of the cross " x " for multiplication is said to be for the first time in the book 
of Oughtred (1631), about the halfway point (modern notation for multiplication), we 
ought it to Leibniz. From 1544, Stiefel, in one of his books did not employ any sign and 
designated the product of two numbers by placing them next to each other. 

V / 

We chose to introduce in this section only the subjects that are essential to the study of mathe- 
matics and theoretical physics of this book as well as those to be absolutely part of the general 
culture of the engineer (some results have application in Biostatistics!). 

4.1 Principle of good order 

We will take for granted the principle that says that every nonempty set S' C N contains a 
smaller element. 

We can use this theorem to prove an important property of numbers named " Archimedean 
property" or "Archimedes’ axiom" which states: 

For Va, b <6 N where a is non-zero, there is at least one positive integer n such that: 

n ■ a ^ b (4.1) 

In other words, for two unequal values, there is always an integer multiple of the smallest, bigger 
than the larger one. We name "Archimedean" structures whose elements satisfy a comparison 
property (see section Set Theory). 

While this is trivial to understand in the case of integers let us prove it because it allows us to see 
the type of approaches used by mathematicians when they must prove trivial items like this... 

Proof 4.10.2. Let us suppose the opposite by saying that for Vn 6 N we have: 

n ■ a <b (4.2) 

If we can prove that it is absurd for any n then we will have prove the Archimedean property 
(and also if a , b are real). 

EAME v3. 5-2013 

4. Arithmetic 

Let us consider then the set: 

S = {b — na\n G N} (4.3) 

ETsing the principle of good order, we deduce that there exist so G S such as so < s for all 
s G S. Let us write that this smaller element is: 

and therefore we also have: 

Sq — b — i~iq cl 

b — (no + l)a G S 

As by hypothesis na <b then we must have: 

b — (n 0 + l)a > b — n 0 a 

and if we reorganize and simplify: 

-(no + 1 ) > -no 

and that we simplify the negative sign we had to get...: 

n 0 + 1 > n 0 






an obvious contradiction! 

This contradiction leads that the initial assumption as na < b for all n then is false and therefore 
the Archimedean property is proved by the absurd. 

□ Q.E.D. 

4.2 Induction Principle 

Let S' be a set of natural numbers that has the following two properties: 

PI. 1 G S' 

P2. If k G S, then k + 1 G S 

S = N \ {0} = N* (4.9) 

We are build like this the set of natural numbers (refer to the section Set Theory to see the 
rigorous construction of the set of naturla number with the Zermelo-Lraenkel axioms). 

Theorem 4.11. Given now: 

B = N*\S (4.10) 

the symbol "\" meaning for recall "excluding". We want to prove that: 

B = 0 (4.11) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Again, even if it is trivial to understand, let us do the proof because it allows us to see the type 
of approaches used by mathematicians when they must prove trivial stuff like this... 

Proof 4.11.1. Let us suppose the opposite, that is to say: 

B ± 0 (4.12) 

By the principle of good order, since B C N, B must have a smallest element which we will 
denote by b 0 . 

But since 1 G S' by the property (PI), we have that b 0 > 1 and of course also that 1 G B, that is 
to say also b 0 — 1 G S. By using the property (P2), we finally have that b 0 G S, that is to say 
that b 0 ^ B, therefore we get a contradiction. 

□ Q.E.D. 


We want to show thanks to the induction principle, that the sum of the first n 
square equals n(n + 1) (2 n + 1) /6, that is to say for n > 1 , we would have to (see section 
Sequences and Series): 

l + 2 + ... + n 2 = J2f 


n(n + 1)(2 n + 1) 


First the above relation is easily verified for n — 1 we will show that n — k + 1 also 
verifies that relation. Under the induction hypothesis: 

l 2 + 2 2 + 

I 2 / 7 1 \2 V'' -2 k{k + 1) (2k + 1) , , N 2 

+ k 2 + (k + lf = y« 2 = — ^ + (k + l) 2 

i-i 6 (4.14) 

(k + l)(k + 2)(2k + 3) 

although we fall back on the assumption of the validity of the first relation but with 
n — k + 1, hence the result. 

This prove process is therefore of great importance in the study of arithmetic. Often observation 
and induction have led to a suspicion of laws it would have been more difficult to find by a priori. 
We realize the accuracy of formulas by the previous method that gave birth to modem algebra 
by Fermat and Pascal studies on the Pascal’s triangle (see section Calculus). 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

4.3 Divisibility 

Definition (#51): Given A, B e Z with A ^ 0. We say that " A divides B (without rest)" if 
there is an integer q (the quotient) such that: 

B = Aq (4.15) 

in which case we write to differentiate of the class division: 

Otherwise, we write 

and we say that " A does not divide B". 





Moreover, if A | B, we also say that ”B can be divided by A" or "B is a multiple of A". 

In case where A\B and that 1 > A < B, we will say that A is a "proper divisor" of B. 
Moreover, it is clear that A|0 regardless of A 6 Z \ {0} otherwise what we have a singularity. 
Here are some basic theorems relating to the division: 

Theorem 4.12. If A\B, then A\BC whatever C e Z. Or more formally: 

VC e Z : A\B=> A\BC (4.18) 

Proof 4.12.1. If A\B, the it exists an integer q such that: 

B = Aq (4.19) 


BC = ( Aq)C = A(qC) (4.20) 

and therefore: 



□ Q.E.D. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Theorem 4.13. If A\B and B\C, then A\C or more formally: 

A\B A A\B => A\C 


Proof 4.13.1. If A\B and B C then, there exists two integers q and r such that B 
C = Br. More formally: 

= Aq and 

A \B A B\C — > 3 (q, r) E N : B = Aq A C = Br 



C = A(qr) 


and hence: 

C = A\C 


□ Q.E.D. 

= Aq and 



□ Q.E.D. 

= Aq and 

We then have: 

Theorem 4.14. If A\B and A\C then: 

A\(Bx + Cy) Vx,y£Z 

Proof 4.14.1. If A\B and A\C then, there exists two integers q and r such that B 
C — Ar. It follows: 

Bx + Cy = ( Aq)x + ( Ar)y = A(qx + ry ) 

and therefore: 

A\(Bx + Cy) Vx,y 6 Z 

Theorem 4.15. If A\B and B\A then: 

A = ±B 

Proof 4.15.1. If A\B and B\A then, there exists two integers q and r such that B 

A = Br. 

B = B(qr) (4.30) 

and thus qr = 1. This is why we can have q — ±1 if r = ±1 and thus: 

A = ±B (4.31) 

Theorem 4.16. If A\B and B 7 ^= the: 


Proof 4.16.1. If A\B then there exist an integer q 7 ^ 0 such that B 

\B\ = \A\\q\ > \A\ 

as |t/| > 1 . 

□ Q.E.D. 


Aq. But then: 


□ Q.E.D. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

4.3.1 Euclidean Division 

The Euclidean division is an operation that, to two integers named respectively the "dividend 
and "divisor" combines two other integers named the "quotient" and "remainder". Initially 
define only for nonzero integers, it can be generalized to relative integers and polynomials, for 

Definition (#52): We name "euclidean division" or "integer division" of two numbers A and B 
the operation of dividing B by A, stopping when the rest is strictly less than A. 

Let us recall (see section Numbers) that any number which admits exactly two euclidean divi- 
sors (such that division gives no remainder) that are the 1 and itself is named a "prime number" 
(which excludes the number 1 of the list of primes) and that any pair of numbers which have 
only 1 as common Euclidean divider are say to be "relatively prime", "mutually prime", or 

Theorem 4.17. Given A, B e Z with A > 0. The "theorem of the Euclidean division" state 
that there are unique integers q (quotient) and r (remainder) such as: 

B = Aq + r^r = B — Aq (4.34) 

where 0 > r < A. Furthermore, if A \ B, then 0 < r < A. 


One cake with 9 parts ( B ), we then have to divide it between 4 people (A) with 
one part remaining (r=l) such that q= 2 

Figure 4.22 - The pie has 9 slices, so each of the 4 people receive 2 slices and 1 is left over. 

and therefore: 

10 = 2 • 4 + 1 


Proof 4.17.1. Let us consider the set: 

S = {r = B — qA\q, B <G Z, A G Z*, B — qA > 0} (4.36) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

It is relatively easy to see that S C N* { 0 } and that S ^ 0, hence, according to the principle of 
good order, we conclude that S contains a smaller element r > 0. Given q the integer satisfying 

r = B — Aq 


We want to first show that r < A assuming the opposite (proof ad absurdum), that is to say that 
r ^ A. So, in this case, we have: 

B — qA = r > A 


which is equivalent to: 

B — (q + 1 )A = r — A > 0 


but B — (q + 1 )A E S and: 

B — (q + 1 )A < B — qA 


This contradicts the fact that: 

r = B — qA 


is the smallest element of S'. So r < A. Finally, it is clear that if r 
second statement of the theorem. 

= 0, we have A\B, hence the 

□ Q.E.D. 


In the statement of the Euclidean division, we assumed that A > 0. What do we get when 

A < 0? In this situation, —A is obviously positive, and then we can apply the Euclidean 
division to B and —A. Therefore, there are integers q and r integers such that: 

B = q(—A) + r 


where 0 > r < \A . But this relation can be written: 

B = -q{A) + R 


where obviously, — q is an integer. The conclusion is that the Euclidean division can be 
stated in a more general form. 

Given 6 Z, there exist two integers q and r such that: 

B = Aq + r 


where 0 > r < \A . Furthermore, if A\ B, then 0 < r < /I 



The integers q and r are unique in the Euclidean division. Indeed, if there are two other integers 
q' and r' such as: 

B = Aq' + r' 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

always with 0 < r' < A, then: 

A(q' — q) — r — r‘ 


and therefore: 

A\(r — r') 


Following theorem 4.13 we have if r — r' 7 ^ 0 that \r — r'\ > A. 

But, this last inequality is impossible as by construction —A < r — r ' . Therefore r = r' and, as 
A 7 ^ 0, then q' = q hence the unicity. 

4.3. 1.1 Greatest common divisor 

The greatest common divisor (gcd) (also known as greatest common factor (gcf), high- 
est common factor (hcf), greatest common measure (gem), or highest common divisor) of 
two or more integers, when at least one of them is not zero, is the largest positive integer that 
divides the numbers without a remainder. 

Definition (#53): Given a,b G Z such as ab 7 ^ 0. The "greatest common divisor" (gc) of a and 
b, denoted: 



is the positive integer n that satisfies the following two properties: 

PI. d\a and d\b (so without remainder r in the division!) 
P2. If c\a and c\b the c < d and c\d (by division!) 

Note that 1 is always a common divisor of two arbitrary integers. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


Let us consider the positive integers 36 and 54. A common divisor of 36 and 54 
is a positive integer that divides 36, and also 54. For example, 1 and 2 are common 
divisors 36 and 54. 

Div 36 = {1, 2, 3, 4, 6, 9, 12, 18, 36} 

Div 54 = {1,2,3,6,9,18,27,54} 

We have the intersection represented by the following Venn diagram: 


with the following set of common divisors: 

However it is not necessarily obvious that the greatest common divisor other than 1 (that is to 
say different of 1) of two integers a and b that are not relatively prime always exists. This is 
proved by the following theorem (however, if the gcd exists, it is by definition unique!) named 
"Bezout theorem" that can also gives the opportunity to prove other interesting properties of 
two numbers as we shall see later . 

Theorem 4.18. Given a, b e Zsuch that ab / 0. If d divides a and d divides b (for both without 
remainder r !) then there must two integers x and y such that: 

d = (a, b) = ax + by 

v y 


This relation is named the "Bezout identity" and it is a linear Diophantine equation (see section 

Proof 4.18.1. Obviously, if a and b are relatively prime we know that d is then 1. 

To prove the Bezout identity let first consider the set: 

S = {d = ax + by\x, y G Z, ax + by > 0} (4.51) 

As S' C N and S =4 0, we can use the principle of good order and conclude that S has a smaller 
element d. We can then write: 

d = ax 0 + by 0 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

for some given choice x 0 , y o G Z. So it is sufficient to prove that d = (a, b ) to prove the Bezout 
identity ! 

Let us proceed with a proof by contradiction by assuming d \ a. Then if this is the case, 
following the Euclidean division, there exist q, r G Z such as a = qd + r, where 0 < r < d. 
But then: 

r = a — qd = a — q(ax 0 + by 0 ) = a(l — qx 0 ) + b(—qy 0 ) (4.53) 

Thus we have that r G S' and r < d, which contradicts the fact that d is the smallest possible 
element of S. Thus we have proven not only that d\a, but also that d always exists and, in the 
same way we prove that d\b. 

□ Q.E.D. 

Corollary 4.18.1. As important corollary let us now prove that if a, b G Z such that ab ^ 0, 

S = {ax + by\x, y G Z} (4.54) 

is the set of all multiples of d(a, b). 

Proof 4.18.2. As d\a and d\b, then we have necessarily dax + by | for any x. y G Z. Either 
M = {nd\n G Z}. Our problem is then reduced to prove the fact that S = M. 

Given first s G S which means that d\s and involves s G M. 

Given a m G M, this would mean that m = nd for a certain n G Z. 

As d = ax 0 + Injo for any choice of integers x 0 , y {) G Z, then: 

m = nd = n(ax 0 + by 0 ) = a(nx 0 ) + b(ny 0 ) G S (4.55) 

□ Q.E.D. 

The assumptions may seem complicated but put your attention a given time on the last equality. 
You will quickly understand! 


If instead of defining the greatest common divisor of two non-zero integers, we allow one 
of them to be equal to 0, say: a ^ b, b = 0. In this case, we have a\b and, according to 
our definition of the GCD, it is clear that (a, 0) = |a|. 


Given d = (a, b) and m G Z, then we have the following properties of the GCD (without proof 
but if a reader request them we will give the details): 

PI. (a, b + ma ) = (a, b) = (a, —b) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

P2. (am, bm ) = |m|(a, b ) where m/ 0 

P4. If g e Z \ {0} such that da and g\b then = 7 ^- (a, b) 

\9 9/ \g\ 

In some books, these four properties are proved using intrinsically the property itself. Personally 
we abstain make usage of this approach because doing this is more ridiculous than anything else 
as the statement of the property is a proof in itself. 

Let us now develop a method (algorithm) that will be very useful to us to calculate (determine) 
the greatest common divisor of two integers (sometimes useful computing science). 

4.3.2 Euclidean Algorithm 

The Euclidean algorithm is an algorithm for determining the greatest common divisor of two 
integers (we have hesitate to put this subject in the section of Theoretical Computing...). 

To address this method intuitively, you must know that that you need to see that an integer as 
a length, a pair of integers as a rectangle (sides) and their GCD is the size of the largest square 
for tile (paving) their rectangle by definition (yes if you think for a moment it’s quite logical!). 

The algorithm decomposes the original rectangle into squares, always smaller and smaller, by 
successive Euclidean division of the length by the width, then the width by the remainder until a 
zero remainder. We must understand this geometric approach to then understand the algorithm. 


Let us consider that we seek the GCD of (a, b) where b is equal 21 and a is equal 
15 and keep in mind that the GCD, besides the fact that it divides a and b, must leave a 
zero remainder! In other words it must divide the remainder of the division of b by a also! 

So we have the following rectangle of 21 by 15: 

Figure 4.24 - First step of the GCD algorithm 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

First we see if 15 is the GCD (it always starts with the smallest). We then divide 21 by 
15, which is equivalent geometrically to: 

Figure 4.25 - Second step of the GCD algorithm 

15 is therefore not the GCD (we suspected it...). We immediately see that we can not 
pave the rectangle with a square of 15 by 15. 

So we have a remainder of 6 (left rectangle). The GCD as we know must, if it exists, by 
definition divide that remains and leave a zero remainder. 

So we have a rectangle of 15 by 6. So we are looking now to pave this new rectangle 
because we know that the greatest common divisor is by construction less than or equal 
to 6. Then we have: 

Figure 4.26 - Third step of the GCD algorithm 

So we divide 15 by remainder 6 (this result will be less than 6 and immediately permits 
to tests whether the reamainder will be the GCD). We are getting: 

Figure 4.27 - Fourth step of the GCD algorithm 

Again, we can not pave the rectangle only with squares. In other words, we have a 
non-zero remainder which is 3. Given now a rectangle of 6 by 3. So we are looking 
now to pave the new rectangle because we know that the greatest common divisor is by 
construction less than or equal to 3 and that it will leave a remainder equal to zero, if it 
exists. We then have geometrically: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Figure 4.28 - Fifth step of the GCD algorithm 

We divide 6 by 3 (which will be less than 3 and permits us to test immediately whether 
the rest will be the GCD): 

Figure 4.29 - Sixth step of the GCD algorithm 

and it’s all good! We then have 3 that leave us with a remainder equal to zero and divides 
the remainder 6 so this is the GCD. So we have in the end: 

Figure 4.30 - Summary of the GCD algorithm 

Now let us see the equivalent formal approach. 

Given a,b E Z, where a > 0. Applying successively the Euclidean division (with b > a), we 
get the following sequence of equations: 

b = 

= qai + n 






a = 

= nq 2 + r 2 



r 2 



n = 

= f\2q :i + r 3 



r 3 


r 2 

r i~ 2 = 

- G-i Qj + r i 



r i 


r i- 

G-i = 

- rjqj+i 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

if d = (a, b ), then d — rj. with the corresponding pseudo-code algorithm: 

Algorithm 1: GCD pseudo-code algorithm 
Data: a,b 
Result: b 

1 initialization; 

2 r = a mod b\ 

3 while r ^ 0 do 

4 I a b\ 

5 b : = r; 

6 f(b) ■= /(a); 

7 a := xi; 

8 | /(«) : = f{x i); 

9 end 

to Display xi; 

Otherwise even more formally: 

Proof 4.18.3. We want first prove that r 3 = (a, b). But, following the property PI: 

(a, b + ma ) = (a, 6) = (a, —6) (4.57) 

we have: 

(a, b ) = (a, n ) = (K, 7-2 ) = . . . = (ry-i, ry) (4.58) 

To prove the second property of the Euclide’s algorithm, we write the prior-previous equation 
of the system under the form: 

rj = r j_2 - qjTj-i (4.59) 

Now, using the previous equation this prior-previous equation of the system, we have: 

fj = 0-2 - o(o- 3 - 0-i0- 2 ) = (1 + OO-i )' 0-2 + (- 0 ) 0-3 (4.60) 

Continuing this process, we can express o as a linear combination of a and b. 

□ Q.E.D. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


Let us calculate the greatest common divisor of (429, 966) and express this num- 
ber as a linear combination of 429 and 966. 

966 = 429 • 2 + 108 
429 = 108 • 3 + 105 


108 = 105 • 1 + 3 
105 = 35-3 

We therefore conclude that: 

r j = d= (966, 429) = 3 (4.62) 

and, in addition, that: 

3 = 108 - 105 • 1 = 108 - (429 - 108 • 3) = 108 • 4 - 429 
= (966 - 429 ■ 2) • 4 - 429 (4.63) 

= 966 • 4 - 429 • 9 = 966 • 4 + 429 • (-9) 

Thus the GCD is indeed expressed as a linear combination of a and b and constitutes as 
such the GCD. 

Definition (#54): We say that the integers ai, a 2 , • • . , a n are for recall "relatively prime" if: 

(ai,a 2 , • • • ,a n ) = 1 (4.64) 

4.3.3 Least Common Multiple 

The least common multiple (also named the "lowest common multiple" or "smallest common 
multiple") of two integers a and b, usually denoted by LCM(a, b), is the smallest positive integer 
that is divisible by both a and b. Since division of integers by zero is undefined, this definition 
has meaning only if a and b are both different from zero. 

The LCM is familiar from grade-school arithmetic as the "lowest common denominator LCD " 
(also named "smallest common denominator") that must be determined before fractions can be 
added, subtracted or compared. The LCM of more than two integers is also well-defined: it is 
the smallest positive integer that is divisible by each of them. 

Definitions (#55): 

Dl. Given a\, a 2 , • • • , a n G Z \ {0}, we say that mis a "common multiple" of ai, a 2 , . . . , a n 
if ai\m for i — 1, 2, . . . , n 

D2. Given ai,a 2 ,...,a n G Z \ {0}, we name "lowest common multiple LCM" of 
ai, a 2 , • • • , a n if ai\m for i = 1,2, ... ,n denoted: 

[ai,a 2 , . . . ,a n ] (4.65) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

the lowest integer positive common multiple to all common multiples of ai, a 2 , . . . , a n . 


El. Let us consider the positive integers 3 and 5. A common multiple of 3 and 5 
is a positive integer which is both a multiple of 3, and a multiple of 5. In other words, 
which is divisible by 3 and 5. We have therefore: 

M 3 = {3, 6, 9, 12, 15, 18, 21, 24, 27, 30, . . .} 

M 5 = {5, 10, 15, 20, 25, 30, 35, 40, 45, . . .} 

We then have the intersection represented by the following Venn diagram: 

with then have following set of common multiples: 

M 3 n M 5 = {15,30,45,60,...} (4.67) 

and therefore the LCM is given by: 

LCM = min{15, 30, 45, 60, . . .} = 15 (4.68) 

Or if it can help here is another possible visualization of the concept: 

Least Common 
Multiple: LCM 

We see obviously that all the common multiples of 3 and 5 is the set of multiples of 15. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


Given oi, a 2 , . . . , a n G Z \ {0}. Then the least common multiple exists. Indeed, consider 
the set E of natural integers m that for all i divide a*. What we will write: 

E = {m\ai\m 6 N, % — 1, 2, . . . , n} (4.69) 

Since we have necessarily |aia 2 . . . a„ e E, then the set is not empty and, according to 
the axiom of good order, the set E contains a smaller positive element. 

V / 

Let us now see some theorems related to the LCM: 

Theorem 4.19. If m is any common multiple of a\, a 2 , . . . , a n then [ai, a 2 , . . . , a n ] \m that is to 
say that m divides each of the a,. 

Proof 4.19.1. Given M = [ai, a 2 , . . . , a n \. Then, by the Euclidean division, there are integers 
q and r such that: 

m = qM + r 0 < r < M (4.70) 

It suffices to show that r = 0. Let us suppose that r ^0 (reductio ad absurdum). Since a, | rri and 
cii\M, the we have aj|r and this for i — 1, 2, . . . , n. So r is common multiple of a±, a 2 , . . . , a n of 
the smallest than the LCM. We just obtained a contradiction, which proves the theorem. 

□ Q.E.D. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Theorem 4.20. If k >, then [ka\, ka 2 , . . . , ka n ] = k[ai, a 2 , . . . , a n ] 

The proof will be assumed obvious (if not as always contact us and will add the details!) 
Theorem 4.21. [a, b] • ( a,b ) = \ab\ 

Proof 4.21.1. 

Lemma 4.21.1. For this proof, we will use the "Euclid’s lemma" that says that if a\bc and 
(a, b) = 1 then a|c. 

In other words Euclid’s lemma captures a fundamental property of prime numbers, namely: If 
a prime divides the product of two numbers, it must divide at least one of those numbers. It 
is also named "Euclid’s first theorem". This lemma is the key of the proof of the fundamental 
theorem of arithmetic that we will see just further below. 

Indeed, this can be easily verified because we have seen that there exists x,y G Z such as 
1 = ax + by and then c = acx + bey. But a|ac and a\bc imply that a\(acx + bey), that is to say 
also that a\c. 

Ok let us now return to our theorem: 

Since (a, b) = (a, —b) and [a, b\ = [a, — b ], it suffices to prove the result for positive integers a 
and b. 

First of all, let consider the case where (a, b) = 1. The integer [a, b] being a multiple of a, we 
can write [a, b] = ma. Thus, we have b\ma and since (a, b) = 01, it follows, by Euclid’s lemma, 
that b\m. Therefore, b < m and then ab < am. But ab is a common multiple of a and b that can 
not be smaller than the LCM. therefore ab = ma = [a, b\. 

For the general case, that is to say (a, b) — d > 1, we have, according to the property: 


and with the result obtained previously that: 


When we multiply both sides of the equation by cl 2 , the result follows and the proof is done. 

□ Q.E.D. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

4.3.4 Fundamental Theorem of Arithmetic 

The fundamental theorem of arithmetic says that every natural number n > 1 can be written as 
a product of primes, and this representation is unique, except for the order in which the prime 
factors are arranged. 

The theorem establishes the importance of prime numbers. Essentially, they are the building 
blocks of building positive integers, each positive integer containing primes in a unique way. 


This theorem is sometimes named "factorization theorem" (wrongly ... because some 
other theorems have the same name ...). 

V ' 

So let’s go: 

Theorem 4.22. Every integer greater than 1 either is prime itself or is the product of prime 
numbers, and that this product is unique, up to the order of the factors. 


This theorem is one of the main reasons why 1 is not considered a prime number: if 1 
were prime, the factorization would not be unique. 

Proof 4.22.1. The proof uses Euclid’s lemma: if a prime p divides the product of two natural 
numbers a and b, then either p divides a or p divides b (or both). 

If n is prime, and therefore product of a unique prime integer, namely itself, the result is true 
and the proof is complete (say that a prime number is product of itself is obviously a misnomer! 
). Suppose that n is not prime and therefore strictly greater than 1 and consider the set: 

D = {d\n and 1 < d < n} (4.73) 

So, D C N and since n is composite, we have that D ^ 0. According to the principle of 
good order, D has a smaller element p\ that is prime, otherwise the minimum choice of p\ is 
contradicted. We can the write n = p \ n i . If n\ is prime, then the proof is complete. If n\ is also 
composite, then we repeat the same argument as before and we deduce the existence of a prime 
number p 2 and of an integer n 2 < ni, such as n — pip 2 n 2 . By continuing we come inevitably 
to the conclusion that n k will be prime. 

So finally we well show that any number can be decomposed into prime numbers factors with 
the principle of good order. 

□ Q.E.D. 

We do not know to this day a simple law that allows to calculate the n-th prime factor p n . Thus, 
to know if an integer m is a prime, it is almost easier at this date to verify its presence in a table 
of prime numbers. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

In fact, we use nowadays the following method: 

Given an integer m, if we want to determine whether it is prime or not, we calculate if it is 
divisible by the primes number p n belonging to the set: 

{Pn e N|p n < Vm} (4.74) 


The integer 223 is neither divisible by 2 or by 3 or by 5 or by 7, 
13. It is useless to continue with the next prime number, because 17 2 = 
conclude therefore that the number 223 is prime. 

or by 11, or by 
= 289 > 223. We 

4.3.5 Congruences (modular arithmetic) 

Modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" upon 
reaching a certain value — the modulus (plural moduli). 

A familiar use of modular arithmetic is in the 12-hour clock (and also the calendar), in which 
the day is divided into two 12-hour periods. If the time is 7:00 now, then 8 hours later it will 
be 3:00. Usual addition would suggest that the later time should be 7 + 8 = 15, but this is not 
the answer because clock time "wraps around" every 12 hours; in 12-hour time, there is no "15 
o’clock". Likewise, if the clock starts at 12:00 (noon) and 21 hours elapse, then the time will be 
9 : 00 the next day, rather than 33:00. Because the hour number starts over after it reaches 12, 
this is arithmetic modulo 12. According to the definition below, 12 is congruent not only to 12 
itself, but also to 0, so the time naed "12:00" could also be called "0:00", since 12 is congruent 
to 0 modulo 12. 

Figure 4.31 - Time-keeping on this clock uses arithmetic modulo 12 (source: Wikipedia) 

Definition (#56): Let m e Z \ 0. If a and b have the same remainder when divided by m in the 
Euclidean division then we say "a is congruent to b modulo m", and we write: 

a = b mod (m) 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

or equivalently there are (at least) on relative integer k such that: 

a = b + km 



We also name the number b "residue". Thus, a residue is an integer congruent to another, 
modulo a given integer m. The reader can verify that this requires that: 

m\(a — b) 



Rl. The reader must well understand that congruence implies a null remainder for the 

R2. We exclude in addition to the 0 also the 1 and — lfor the possible values of m in the 
definition of congruence in some books. 

R3. Behind the term congruence are hidden similar concepts of different levels of ab- 

• In modular arithmetic, so we say that "two integers a and b are congruent modulo 
m if they have the same remaining in the Euclidean division by m". We can also 
say that they are congruent modulo m if their difference is a multiple of m. 

• In the study of oriented angles, we say that "two measurements are congruent mod- 
ulo 27 r [rad] if and only if their difference is a multiple of 27 t [rad]". This charac- 
terize two measures of the same angle (see section Trigonometry). 

• In algebra, we speak of congruence modulo / in a commutative ring (see section 
Set Theory) for which I is an ideal: "x is congruent to y modulo / if and only if 
their difference belongs to This congruence is an equivalence relation compat- 
ible with the operations of addition and multiplication and gives the possibility to 
define a quotient ring of the parent set with its ideal I. 

• We sometimes see in the study of geometry (see section Euclidean Geometry) the 
term "congruence" used in place of "similar". It is then a simple equivalence rela- 
tion on the set of plane figures. 

V I J 

The relation of congruence = is an equivalence relation (see section Operators), in other words, 
given a, 6, c, m <G Z, m > 1 then the congruence relation is: 

PI. Reflexive: 

P2. Symmetric: 

P3. Transitive: 

a = a mod (m) 

a = b mod (m) b = b mod (m) 

a = b mod ( m),b = c mod (m) =>■ a = b mod (m) 





info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

The properties PI and P 2 are obvious (if this is not the case please let us know we will de- 
velop!). We will prove only P3. 

Proof 4.22.2. The assumptions imply that 

b = a + km c = b + Im 


But then: 

c = b + Im = (a + km) + Im — a + (k + l)m 


This prove that a and c are congruent modulo m. 

□ Q.E.D. 

The relation of congruence = is compatible with the sum and the product (remember that power 

is ultimately an extension of the product!). 

Indeed, given (a, b, a', b ' , m) £Z,m>l such that a = mod (m) and a' 

= b' then: 

PI. a + a' = b + b' mod (m) 

P2. aa' = bb 1 mod (m) 

Proof 4.22.3. We have: 

a = b + km a' — V + Im 


by hypothesis. But then: 

a + a' = b + b' + (Z + k)m 


which proves PI. We also have: 

aa' = bb' + blm + b'km + klm'2 = bb' + sm 


which proves P2. 

□ Q.E.D. 


The congruence relation behaves in many point like the relation of equality. However a 
property of the relation of equality = is not true for that of congruence =, namely the 
simplification: If ab = mod (m), we do not have necessarily b = c mod (m). 

\ / 


2 • 1 = 2 • 3 mod (4) but 1^3 mod (4) (4.86) 

So far we have seen the properties of congruences involving a single modulus. We will now 
study the behavior of the congruence relation on a change of modulus. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

PI. If a = b mod (m) and d\m, then a = b mod (d) 

P2. If a = b and a = b mod (s) then a and b are congruate modulus [r, s] 

We think this two properties are obvious. We do not need to go into details for PI. For P2, since 
b — a is a multiple of r and s since by hypothesis: 

= k, - = / =>■ b — a = rk = si (4.87) 

r s 

b — a is then a multiple of the LCM of r and s, which proves P2. 

From these properties it follows that if we denote by f(x) a polynomial with integer coefficients 
(positive or negative): 

f(x) = Ax 11 + Bx n " 1 + Cx n ~ 2 + ... + Kx + L (4.88) 

The congruence a = b mod (m) will also give f(a ) = f(b) mod (m). 

If we replace x successively by all integers in a polynomial f(x) with integer coefficients, and 
if we take the remaining modulus m, these remaining are reproduced from m to m (in the sense 
where the congruence is satisfied), since we have, regardless of the number m and x: 

f(x) = f(x + m ) mod (m) (4.89) 

We then deduce then the impossibility of solving the following congruence: 

f{x) = r mod (m) (4.90) 

with integer numbers, if r is anyone of the "non-remaining" (a residue that does not satisfy the 

4.3.5. 1 Congruence Class 

Definition (#57): We name "modulo m congruence class", the subset of the set Z de- 
fined by the property that two elements a and b of Zare in the same class if and only if a = b 
mod (m) or that a set of elements are congruent by this same modulo. 


We saw in the section Operators that this is in fact an equivalence class as the congruence 
modulo m is, as we have proved above, a relation of equivalence! ! ! 

V ) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


Given m = 3. We divide the set of integers into congruence classes of modulo 3. 

Here are for example three sets whose elements are congruent 
rest (see well what gives the set of all these classes together!): 

with one another without 

.,-9,-6 -3,0,3,6,9,12,...} 



,-8,-5, -2, 1,4, 7, 10,13,...} 



..,-7, -4, -1,2, 5, 8, 11,...} 


Thus we see that for each pair of elements of a congruence class, the congruence modulo 
3 exists. However, we see that we can not take that —9 = —8 mod (3) where —9 is in 
the first class and —8 in the second. 

The smallest non-negative number of the first class is 0, this of the second is 1 and the 
last is 2. Thus, we will denote these three classes respectively [0] 3 , [1] 3 , [2] 3 , the number 
3 in the index indicating the modulus. 

It is interesting to notice that if we take any number of the first class and any number of 
the second class, then their sum is always in the second class. This can be generalized 
and allows to define a sum of classes modulo 3 by writing: 

[0] 3 + [0] 3 = [0] 3 ; 

|0]3 + [1] 3 - [1]3 

[0] 3 + [2] 3 = [2] 3 ; 

[l]s + [l]s = [2] 3 


[1] 3 + [2] 3 = [0] 3 ; 

[2] 3 + [2] 3 = [1]3 

[0] 3 x [0] 3 = [0] 3 ; 

[0] s X [l]s = [0]s 

[0] 3 x [2] 3 = [0] 3 ; 

[1]3 x [1]3 — [IJ3 


[l]s X [2] 3 = [2] 3 ; 

[2] 3 X [2] 3 = [1] 3 

Thus, for any m > 1, the congruence class: 

a mod (m) (4.96) 

is the set of integers congruent to a modulo m (and congruent modulo m between them)! ! ! This 
class is denoted by: 

:= a 

mod (m) 



Having bracketed the "and congruent modulo m between them" is due to the fact that the 
congruence, being an equivalence relation we have as we have proved above that b = a 
mod (m), c = mod (m), then b = a mod (m). 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Definition (#58): The set of congruence classes [a] m (that forms by the fact that congruence is 
an equivalence relation: "equivalence classes"), for a fixed m gives what we name a "quotient 
set" (see section Operators). More rigorously, we speak of the "quotient set of Z by the con- 
gruence relation" whose elements are the congruence classes (or: equivalence classes) and then 
form the ring Z/mZ. We deduce from the definition the following two trivial properties: 

1. The number b is in the class [a] m if and only if a = b mod (m) 

2. The classes [a] m and [b] m are equal if and only if a = b mod (m) 

Theorem 4.23. There are exactly m different congruence classes of modulo m, ie 

[Ojmj [l]m; ■ ■ ■ j [m l]m- 

Proof 4.23.1. Given m > 1, than any integer a is congruent modulo m to one and only one 
integer r of the set {0, 1, 2, . . . , m — 1} (notice well, it is important, that we restrict ourselves 
to the positive integers without taking into account the negative one!) . In addition, this integer 
r is exactly the remaining of the division of a by m. In other words, if 0 < r < m, then: 

a = r mod (m) (4.98) 

if and only if a = qm + r where x is the quotientof a by m and r is the remainder. The proof is 
an immediate consequence of the definition of the congruence and of the Euclidean division. 

□ Q.E.D. 

Definition (#59): An integer b in a congruence class modulo m is named a "representative of 
this class" (it is clear that by the equivalence relation that two representative of the same class 
are congruent modulo m to each other). 

We can now be able to build an addition and a multiplication on the congruence classes. To 
define the sum of two classes [a] m , [b\ m , it suffices to take one representative from each class, 
to their sum and take the congruence class of the result. Thus (see examples above ): 

[a)m + [b\m — [ a + b\m (4.99) 

And same for the multiplication: 

[a]m ■ [b\m = [a ■ b\ m (4.100) 

By construction of the addition and multiplication, we see that 0 (zero) is the neutral element 
for addition: 

[a\m + [0] m = [a\ m , Va e Z, Vm G N (4.101) 

and the class of the integer 1 is the neutral element for multiplication: 

[a]m ■ [l]m = [a]m, Va G Z, Vm G N (4.102) 

Definition (#60): An element [a] m of Z/mZ is "one unit" if there is an element [b] m G Z/mZ 
such that [a] m ■ [b] m . The following theorem helps to characterize classes modulo m which are 
units in Z/m/Z: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Theorem 4.24. Given [a] un element of Z/m/Z. Then [a] is a unit if and only if (a, m) = 1. 

Proof 4.24.1. Suppose first that (a, m) = 1. Then by Bezout theorem, we have its identity: 

as + mr = 1 (4.103) 

In other words, as is congruent to 1 modulo m. But this is equivalent to write by definition that 
[a] [s] = 1 showing that [a] is a unit. Conversely, if [a] is a unit, this implies that there exists a 
class [s] such that [a][s] = 1. 

Thus, we have just proved that Z/Z is indeed a ring since it has an addition, a multiplication, a 
neutral element and an inverse! ! 

□ Q.E.D. Complete set of residues 

Definition (#61): A set of numbers do, ai, ..., a<in — 1) mod (m) form a "complete 
set of residues", also named a "covering system", if they satisfy ai = i mod (m) for 
i — 0, 1, ..., m — 1. 

This type of systems will help us to introduce in the section of Cryptography to introduce an 
important function used in secured communication devices at the end of the 20th century and 
beginning of the 21st century. 

To introduce this concept, consider the following finite system of congruences modulo 6: 


= 0 




= 1 




= 2 




= 3 




= 4 




= 5 



where as the reader will have probably noticed it: no residue is repeated in the list and no residue 
taken in pairs are congruent between them modulo m (is this last point that oblige to stop at 5 
in our example). We then say that the residues are "mutually incongruent". 

If these conditions are met, then we say that the ordered set (6, 13, 2, —3, 22, 11} is a "complete 
system of residues modulo m" as already defined. Such a set is not unique for a given module. 
Thus, the set (0, 1, 2, 3, 4, 5} is also a complete (trivial) system of residues modulo 6. 

If we eliminate from this entire system all numbers that are not prime to m, then we have a 
"system of reduced residue modulo m". So in the above example, the reduced residue system 
modulo 6 will be {13, 11}. 

Reduces systems will be useful tou us in the section Cryptography to prove an important result 
in the asymmetric public key systems. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

We will see also in the section Cryptography, the "Euler indicator function" when m is prime 
(which is not the case in the previous example) gives the cardinal of the reduced system modulus 
m as being equal to: 

0(m) = m — 1 = Card ({n < in, (■ n , m) = 1}) 


So under the assumption condition that m is prime, the reduced system of residue is obviously 

{ri,r 2 ,...,r 0 (m)} 

(4.106) Chinese remainder theorem 

In its basic form, the Chinese remainder theorem will determine a number n that, when divided 
by some given divisors, leaves given remainders. In Sun Tzu’s example (stated in modern 
terminology), what is the smallest number n that when divided by 3 leaves a remainder of 2, 
when divided by 5 leaves a remainder of 3, and when divided by 7 leaves a remainder of 2? 

The Chinese remainder theorem can therefore be seen as solving a linear system but in a modu- 
lar system. For many students and future engineers, this theorem will never be used in practice, 
but some will see it again it in the field of cryptography (in the context of decryption especially). 

There are several possible proofs as always but we opted for the one that, as always for us, 
seemed the most educational. 

Given M and n both prime integers between them. Then special case of a system of two 
congruences (see further below for an example of resolution of a system of three congruences): 

x = a mod (m) 
x = b mod (m) 


has a unique solution. 

Proof 4.24.2. As m and n are assumed as prime between them, there exists then u and v two 
integers such as (application of the Bezout identity proved earlier above): 

um + vn = 1 


Therefore we have: 

aum + aim = a 


That is to say: 

aim = a mod (m) 


Then we have also by extension: 

bum + bvn = b 


That is to say: 

bum = b mod (n) 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

So to be clear, we have so far: 

urn + v n = 1 =>- 

avn = a 
bum = b 

mod (m) 
mod (n) 

We then have for recall: 

bum = b mod (n) =>• bum + bvn = b 

But we can also writhe with k € Z: 


bum + (k + av ) n = b 

' V ' 


Then we also have: 

bum + avn = b mod (n) 

atm = a mod (m) am + avn = a 

But we can also writhe with fceZ: 

(K + bu) m + avn = a 


bum + avn = a mod (m) 

So to be alway clear, we have so far: 

urn + vn = 1 = (m, n) 

avn = a mod (m) 
bum = b mod (n) 

bum + avn = a mod (m) 
bum + avn = b mod (n) 

So finally we get that: 

x = bum + avn 













is a particular solution of the system. But we also have \/i,j 6 Z by the definition of the 

bum + avn + im = a mod (m) 
bum + avn + jn = b mod (n) 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

So that x is always solution of the system, we must have: 

i = n 

j = m 

and therefore: 


bum + aim + nm = a mod (m) 
bum + avn + rrin = b mod (n) 

Theref ore a little bit more general solution is 


x = bum + avn + nm 

But by extension, we have the general solution: 


x = bum + avn + znm 

with z G Z. We then say sometimes that the solution is "x modulo nm" . 


□ Q.E.D 


As an example, consider the problem of finding an integer x such that: 

x = 2 mod (3) 
x = 3 mod (4) 
x = 1 mod (3) 


A brute-force approach converts these congruences into sets and writes the elements out 
to the product of 3 • 4 • 5 = 60 (the solutions modulo 60 for each congruence): 

x G (2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47, 50, 53, 56, 59} 
x e (3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59} 
x G {1,6,11,16,21,26,31,36,41,46,51,56} 

To find an x that satisfies all three congruences, intersect the three sets to get: 


x = 11 

This solution is modulo 60, hence all solutions are expressed as: 


x = 11 mod (60) 


Another way to find a solution is with basic algebra, modular arithmetic, and stepwise 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

We start by translating these congruences into equations for some t, s, and u: 

x = 2 + 3 1 

x = 3 + 4s 

x = 1 + 5m 

Start by substituting the x from the first equation into the second congruence: 


2 + 3/ = 3 mod (4) 

That is to say: 


3/ = 1 mod (4) 



t = 3 mod (4) 


meaning that t = 3 + 4s for some integer s. Substitute now t into the first equation: 

x = 2 + 3/ = 2 + 3(3 + 4s) = 11 + 12s 

Substitute this x into the third congruence: 


11 + 12s = 1 mod (5) 

That is to say: 


1 + Is = 1 mod (5) 



2s = 0 mod (5) 

meaning that s = 0 + 5m for some integer u. Finally: 


x = 11 + 12s = 11 + 12 (5 m) = 11 + 60m 

So, we have solution {11, 71, 131, 191, . . .}. 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

4.3.6 Continued fraction 

A continued fraction is an expression obtained through an iterative process of representing 
a number as the sum of its integer part and the reciprocal of another number, then writing 
this other number as the sum of its integer part and another reciprocal, and so on. In a finite 
continued fraction (or terminated continued fraction), the iteration/recursion is terminated after 
finitely many steps by using an integer in lieu of another continued fraction. In contrast, an 
infinite continued fraction is an infinite expression. In either case, all integers in the sequence, 
other than the first, must be positive. The integers a* are named the "coefficients" or "terms" of 
the continued fraction. 

The notion of continued fraction come back from the time of Fermat and culminated with the 
work of Lagrange and Legendre in the late 18th century. These fractions are important in 
physics because we find them back in our study of acoustic and also in the thought process that 
led Galois to create his group theory and also in the studies gear ratios of (for watch complica- 
tions as discussed in the section of Mechanical Engineering). 

To understand the motiviation of continued fraction let us introduce a basic example. 

Consider a typical rational number: 



which is around 4.4624. 

As first approximation, stat with 4, which is the integer part: 

415 , 43 

— =4T — 



Note that the fractional part is the reciprocal of 93/43 which is about 2.1628. Use the integer 
part, 2, as an approximation for the reciprocal, to get a second approximation of: 

So we have so fare: 

93 7 

— = 2 + — 
43 43 



4 + 



Note that the fractional part is the reciprocal of 43/7 which is about 6.1429. Use the integer 
part, 6, as an approximation for the reciprocal, to get a second approximation of: 






4 + 

2 + 


6+ 7 




info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Note that the fractional part 1/7 is the reciprocal of 7 which is about... 7 Use the integer part, 7, 
as an approximation for the reciprocal, to get a second approximation of: 




Therefore we get: 


~ 93 ~ 

4 + 

2 + 

6 + 

7 + 

4 + 

2 + 

6+ 7 


This expression is named as we know the "continued fraction representation of the number". 

Dropping some of the less essential parts of the expression: 


~ 93 ~ 

4 + 

2 + 


6+ 7 


gives the abbreviated notation: 

f H4;2. 6 .7] 


Note that it is customary to replace only the first comma by a semicolon. 

As generalization of the previous example let us consider in a first time the rational number a/b 
with (a, b) = 1 with b > 0 and a > b. We know that all the quotients q, and the remaining r, 
are within the scope of the Euclidean division positive integers. 

Let us recall that the Euclidean algorithm already seen earlier (but written in a slightly different 






T 2 

<?2 + 
<?3 + 

r J2 



r 2 

r n - 2 

r n - 1 

q n 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

By successive substitutions, we get: 

7 — 9i + 77 — — 9i 

6 b/ri 

92 + 

n/r 2 

= 9i 




9m H 


What is also sometimes written: 

a 1 

7 = 9i + 77 = 9i 

b 1 /ei 

92 H = 



So any positive rational number can be expressed as a finite continued fraction where q n G N. 
Taking our introducing example: 



= 4 + 

= 4 + 


2 + 

2 + 


6+ 7 

we notice indeed that g„eN and that we have by construction: 


Qn = 

&n— 1 


where the brackets represent the integer part and that we also have: 

7 — — Qn + £ 



The development of the number a/6 is named the "development of the number a/6 in finite 
continued fraction" and is condensed in the following notation: 



Let us see now another example: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


Let us see how the extract the square root of a number A (for example A = 2 
such that we want to extract y/2) by the continued fraction method. 

Given a the largest integer whose square a 2 is smaller than A. We subtract it to A. So 
there is a remaining of (for A = 2, we have a = 1): 

r = A — a 2 = (VA — a)(VA + a) (4.160) 

where we have used a remarkable identities that we will prove in the section Calculus 
later. Hence dividing both members by the second parenthesis, we have: 

VA — a = 

V A + a 


VA — a - 1 — -= 

\J A CL 

In the denominator, we replace a /A by: 


a H — -j= 

VA + a 

That gives: 

VA — a T 

2a + 

VA T a 





etc .... we thus see that the system is simple to determine the expression of a root square 
in terms of continued fraction. 

We consider now as intuitive that every rational number can be expressed as finite continued 
fraction and conversely that any finite continued fraction represents a rational number. By 
extension, an irrational number is represented by an infinite continued fraction! 

Now consider [<?i; (?2, <Z3, • • • , q n \ a finite continued fraction. The continued fraction: 

C k = [qi 1 q 2 ,...,q n } (4.165) 

where k = 1, 2, . . . , n is named the "fc-th reduced" or "k - th convergent" or the "&;-th partial 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

With this notation, we have: 

C 1 = 

c 2 = 

^3 = 

[Qi] = 9i 

[Qi\Q2] = 9i + — 

[?i; 92,93] = 9i + 

_ q x q 2 + 1 


1 _ (9i 92 + 1) ~ 93 + 9i 

1 9293 + 1 

92 H 



((9i 92 + 1) • 93 + 9i) ' 94 + (9192 + 1) 

(9293 + 1) • 94 + 92 

To simplify the expressions above, we introduce the the sequences {n;}, {d,} (n is for 
numerator and d for denominator) defined by: 

C 4 = [9i;92,93,94] = 

n 0 = 1, 7ii = 91 , . . . , rii = qiTii-i + 

do = 0, di = 1, . . . , di = qdi-i + d t - 2 


thanks to this construction, we have an interesting immediate little inequality that will be useful 
to us further below: 

0 — do < d\ < d 2 < do <■■ ■ 

With the above definition, we find that: 

n _ n i n - 122 n - 123 r - Ui 

L 1 — ~Ti *-'2 - y, °3 

Ci\ &2 CI 3 di 

Either by generalizing: 

Ck = [91, 92, 93, - - - , 9fc] = 






Now let us show for later use that for i > 1, we have: 

nidi - 1 - diUi - 1 = (-1)* (4.171) 

The result is immediate for i = 1. Assuming that the result is true for i let us show that it is true 
for i + 1. Since: 

n i+1 di - di +1 n t = (g i+ i7ij + rij_i)dj - (q i+1 di + d^rii = -{d^Ui - n^di) = (-1)* +1 


then using the induction hypothesis, we get the result! 

We can now establish a vital relation for what will follow. 

Theorem 4.25. Let us prove that if Ck is the k - th reduced to of the simple finite continued 
fraction [qi,q 2 ,..., q n \ then: 

C 1 <C 3 <C 5 < . . . < Co < C 4 < C 2 (4.173) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Proof 4.25.1. 

c k+2 -C. k = (C k+2 ~c k+1 ) + (c k+1 -c k ) = - r ^) + ( r ^r 1 -- r 

\ «fc + 2 Ctfe +1 / \ “fc+l 


d k , 

k-\-2 ( 

( t^fc+24+i 'H’ k +id k -i r2 \ ^ / n k -\.\d k n k d k s r \ \ ( 1 ) 

V d k+2 d k+ i ) V d k+ id k ) d k+2 d k+ i 4+i4 

(-i) fc+2 4 + (-i) fc+1 4 +2 = (-i) fc+1 (-i)4 + (-i) fc+1 4 +2 
4 +2 4+i4 4 +2 4+i4 

-(-i) fe+1 4 + (-i) fc+1 4+ 2 (-i) fe+1 (4+ 2 - 4) 

4 +2 4+i4 

4+2 4+i 4 



o — 4 ^ c?i <c 4 4 • 

4+2 — 4 > o 

indicating to us that the sign of ( 4 +i — <4 is the same as (— l) fc+1 . 

It follows that C k+ 2 > <4 for an odd k, and C k+2 < C k for k even. Then: 

Ci < C 3 < C 5 < . . . and C 2 > (4 > Cq > . . . 

and after as: 

/~t k—i ki k d k —\ n k —\d k 

t - k C k — 1 rr j 7 

' ; . «fc-l UfcUfc - 1 

So for /c even, we have C k > C k - 1 , we therefore deduce that: 

Ci < <4 < <4 < . . . < C 6 < <4 < C 2 








□ Q.E.D. 

Let us show now that every infinite continued fraction can be any irrational number. 

In formal terms, if {q n } is a sequence of positive integers and that we consider C n = 
[qi, q 2 , . . . , q n ] then it necessarily converges to a real number if n — )■ +00. 

Actually it is not difficult to observe (it’s quite intuitive) with a practical example that we have: 

C k -C k _i^ 0 (4.180) 

when k — >■ +00. 

Now, let us denote by x any real number and <i\ = M the integer part of this real number. Then 
we saw at the beginning of our study of continued fractions that: 

x = qi + Ei 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Therefore it comes: 

£i = x-qi 


Let’s look for the needs of the section on Acoustics on the calculation of a continued fraction 
of a logarithm using the previous relation! 

First let us recall that: 

t ~ Qi + 
b l/£i 

= qi 

q2 + I/E2 

That is (relation proved in the section of Functional Analysis): 

In (u) 

x = log a («) = 


with 1 < a < u and (a, u) = 1. 
Given y n defined by: 

Therefore let us prove that: 


y-i = u y 0 = a y n+ i = — 



&r). / \ Qn 

ln( 2 / n *i) 

Indeed for n = 1 we have: 

. _ T n h ln M „ _ ln (y-i) 

£i — x — Qi — , 7 x — qi — -j— 7 — V- 
ln(a) m {y 0 ) 







for n = 2 we use first the fact that: 

ln(ri) ln(u) — q 1 ln(a) ln(rt) — ln(a l?1 ) 

£i = x — qi = t r — qi = j— = T~r \ 

m(a) fn(a) fn(a) 

111 (jM (yf) ■“(»!) 

ln(a) ln(a) ln(a) 


1 ln(a) 

£i ln(yi) 

and as we had proved that: 

1 1 ln(a) In (j/o) 

=q n + E n =>£ 2 = q 2 = — r - q 2 = r7 — \ _ 72 

£ n - 1 £1 m(?/i) ln^G 

etc... by induction demonstrating our right to use this notation changes. 





info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


Let us look for the expression of the continuous fraction of: 

x = log a (u) = log 2 (3) (4.191) 

We know by playing with the definition of the logarithm that: 

2 1 < 3 < 2 2 (4.192) 


loga^ 1 ) < log 2 (3) < log 2 (2 2 ) =>• log 2 (2) < log 2 (3) < 21og 2 (2) 
=> 1 < !° g2 ^ < 2 => 1 < log 2 (3) < 2 

log 2 ( 2 ) 

therefore qi — 1. Then we have: 

_ ln(t/i_ 2 ) _ ln(j/_i) 

oi Mvh) qi ln (//o) qi 

and as: 

V-i = u = 3 y 0 = a = 2 

it comes: 




So we have the first partial quotient: 



log 2 (3) 

Qi + 


1 / E\ 

Mg) _ i 


Verbatim we have already: 

_ y-i _ 3 

t/o+i - y i - - 2l 



Let us simplify: 

ln(3) y 1 ln(2) ln(2) 

ln(2) ) 111 (!) Mj/i) 

So the first partial quotient can be written: 








info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


info @ sciences. ch 

Set Theory 

D uring our study of numbers, operators, and number theory (in the chapters of the 
same name), we often used the terms "groups", "rings", "body", "homomorphism", 
etc. and thereafter we will continue to do it again many times. Besides the fact that 
these concepts are of utmost importance, to give demonstrations or build mathemat- 
ical concepts essential to the study of contemporary theoretical physics (quantum field physics, 
string theory, standard model, ... ), they allow us to understand the components and the basic 
properties of mathematics and its operators by storing them in separate categories. So, choose 
to present the Set Theory as the 5th chapter of this book is a very questionable choice as rigor- 
ously that it is where almost everything begins... However, we still needed to expose the Proof 
Theory for the notations and methods that will be used here. 

Moreover, when teaching modern mathematics in the secondary or primary (in the years 1970), 
the language of sets and the preliminary study of binary relations to a more rigorous approach 
to the notion of functions and applications of mathematics in general was introduced (see defi- 
nition below). 

Definition (#62): We talk about "arrow diagram" ( or "sagittal diagram" from latin "sagitta" = 
arrow) to all diagram showing a correspondence between the two sets of components connected 
wholly or partially by a set of arrows. 

For example, the graphical representation of a defined function of the set E = 
{—3, —2, —1, 0, 1, 2, 3} to the set F = {0, 1, 2, 4, 9} lead to the sagittal diagram below: 

A relation from E to E provide an arrow diagram of the type: 

4. Arithmetic 

EAME v3. 5-2013 

Figure 4.33 - Function returning in its own set of definitions 

The closure of each element showing a "reflexive relation" and the systematic presence of a 
back arrow indicating a "symmetrical relation". 

Definition (#63): If the target set is identical to the original set, we say that we have a "binary 

However choosing to introduce the Set Theory in school classrooms has also some other reason. 
In fact, for the sake of internal rigor (i.e. not related to reality), a very large part of mathematics 
was rebuilt within a single axiomatic framework, so called "Set Theory", in the sense that each 
mathematical concept (previously independent of the other) is returned to a definition where all 
the logical components come from this same framework: it is regarded as fundamental! Thus, 
the rigor of reasoning carried out within Set Theory is guaranteed by the fact that the frame is 
"non-contradictory" or "consistent". Let us see now the definitions that build this framework. 

Definitions (#64): 

Dl. We name "set" any list, collection or gathering of well-defined objects, explicitly or im- 

D2. A "Universe" U is an object whose constituents are sets . 

Note that what mathematicians name "Universe" is not a set! In fact it is a model 
that satisfies to the axioms of sets. 

Indeed, we will see that we can not talk about the set of all sets (because this is 
not a set) to designate the object that consists of all the sets and that’s why we talk about 

D3. We name "elements" or "members of the set" objects belonging to the set and we write: 

peA (5.1) 

if p is an element of the set A and in the contrary case: 

P&A (5.2) 

If B is a "part" of A, or "subset" of A, we write this: 

B c A or Ad B (5.3) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


Vx, x e B x e A 



El. A = {1,2,3} 

E2. X = {X | a; is a positive integer} 

D4. We can provide sets with a number of relations that compare (useful sometimes...) their 
elements or to compare some of their properties. These relations are called "comparison 
relations" or "order relations" (see section Operators). 


Rl. The structure of ordered set has original been set up in the framework of the 
Numbers Theory by Cantor and Dedekind . 

R2. As we have proved in the chapter on Operators, N, Z, Q, M are totally ordered 
by the usual relations <,>• The relation <, often called "strict order" is not an 
order relation because not reflexive and not antisymmetric (see section Operators). For 
example, in N the relation "a divides b " , often denoted by the symbol "I" is a partial order. 

R3. If R is an ordering on E and F is a subset of E, the restriction to F of the relation 
R is an order on F, called "order induced by R in F" . 

R4. If R is an order on E, the relation R' defined by: 

xR'y -v^ yRx 

is an order on E, called "reciprocal order" of R. The reciprocal order < of the usual order 
is the order noted > and reciprocal order to the order "a divides b" in N is the order ”b is 
a multiple of a". 

V / 

The set is the basic mathematical entity whose existence is defined: it is not defined as itself but 
by its properties, given by the axioms. It uses a human process: a ki nd of categorization feature, 
which allows thought to distinguish several independent qualified elements. 

Theorem 4.26. We can demonstrate from these concepts, that the number of subsets of a set of 
cardinal n is 2 n . 

Proof 4.26.1. First there is the empty set 0, that is 0 items Chosen from n, i.e. 6',} (notation of 
binomial coefficient non-conform with ISO 31-11!) as we have seen in chapter Probabilities: 

c: = v = 

A k n k\ 
k\ k\ (n - k ) 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

and so on... 

The number of subsets (cardinal) of E corresponds to the summation of all binomial coeffi- 

Cart(P(£)) = 5;C* 


But, we have (see section Algebraic Calculation): 


{x + yT = Y J C k n x k y n - k (5.7) 

k = 0 



C \ 


Card (P (E)) = C k l k l n ~ k = (1 + l) n = 2 n 

k = 0 


□ Q.E.D. 


Consider the set S = {xi,x 2 , x 3 }, we have the set of all parts of P(S) consisting 

— The empty set: {} = 0 

— The singletons: xi,x 2 ,x 3 

— The duets: xi,x 2 ,xi,x 3 ,x 2 ,x 3 

— Itself: {x 1 ,x 2 ,x 3 } 

Such that: 

P(S) = {0, X U X 2 , X 3 , X U X 2 , X!, x 3 ,x 2 , x 3 , x u x 2 , x 3 } 

What makes effectively 8 elements! 



The order in which the elements are differentiated does not come into account when 
counting parts of the original set. 

In Applied Mathematics, we work almost exclusively with sets of numbers. Therefore, we will 
limit our study of definitions and properties of these. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Now let us formali z e the basic concepts for working with the most common sets we encounter 
in the basic school curriculum. 

5.1 Zermelo-Fraenkel Axiomatic 

The Zermelo-Fraenkel axiomatic, abbreviated sometimes "ZF-C axioms" shown below was 
formulated by Ernst Zermelo and Abraham Adolf Fraenkel specified by the early 20th century 
and completed by the axiom of choice (hence the capital C in ZF-C). It is considered as the 
most natural axiomatic structure in the context of set theory. 


There are many other axiomatic structures, based on the more general concept of "class", 
as developed by von Neumann, Bernays and Godel (for the notations, see section Proof 

V / 

Strictly technically speaking..., the ZF axioms are statements of calculation for first order pred- 
icate (see section Proof Theory) egalitarian in a language with only one primitive symbol for 
membership (binary relation). The following should therefore only be seen as an attempt (...) 
to express in English the expected significance of these axioms. 

Al. Axiom of extensionality: 

Two sets are equal if and only if they have the same elements. This is what we note : 

A = B (Vx G A, x G B) A (Vx G B,x G A) (5.10) 

So A and B are equal if every element x of A is also in B and every element x of B also 
belongs to A. 

A2. Axiom of empty set: 

The empty set exists, we note it: 

0 (5.11) 

and it has no element, its cardinality is therefore 0. In fact this axiom can be deduced 
from another axiom that we will see a little further but it is convenient to introduce it by 
convenience for teaching in high-school classes. 

A3. Axiom of pairing: 

If A and B are two sets, then, there exist a set C containing A and B alone and as 
components. This set C is then noted A, B. From the perspective of the sets considered 
elements that gives: 

VAVB3C : AeCABeC (5.12) 

This axiom also shows the existence of the "singleton" a set noted: 

{X} (5.13) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

which is a set whose only element is X (and therefore with unitary cardinal). We simply 
need to apply the axiom asking equality between A and B. 

A4. Axiom of the sum (also called "axiom of union"): 

This axiom allows us to build the union (merge) of sets. Said in a most common way: the 
union (merge) of any family of a set, is... a set. The union of any family of sets is often 

U Ai (5.14) 


or if we take some of its elements: 

U Zi (5.15) 


A5. Axiom of subsets: 

He expressed that for any set A, the set of all its parts P(A) exists (do not confuse with 
the "P" of probability!). So for any set A, we can associate a set B which contains exactly 
the parts C (verbatim the subsets) of the first: 

VA3BVC : (C EB^C cA) (5.16) 

A6. Axiom of infinity: 

This axiom express the fact that there exist an infinite set. To formalize it, we say that 
there exist a set, called "autosuccessor set" A containing 0 (the empty set) such that if x 
belongs to A, then also x U {x} belongs to A: 

A is autosuccessor (0 e A) A (x E A =>• (x U {x}) E A ) (5.17) 

This axiom expresses for example that the set of integers exists. Indeed, N is so the small- 
est autosuccessor set in the sense of inclusion N = {0,{0,{0, ...}}} and by convention 
we note (where we build the Natural Set): 

0 = 0 

1 = { 0 } 

2 = { 0 ,{ 0 }} 


A7. Axiom of regularity (also called "foundation axiom"): 

The main purpose of this axiom is just to eliminate the possibility of having A as part of 

Thus, for any non-empty set A, there exists a set B which is an element of A such that no 
element of A is an element of B (you must distinguish the level of the language used, a 
set and its elements have not the same status!) that we note: 

WA t^0: 3BeA,ADB = 0 (5.19) 

and thus result we expected to have: 

VA,AgA (5.20) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Proof 4.26.2. Indeed, let A be a set such that A G A. Consider the singleton {A}, set 
whose only element is A. According to the axiom of foundation, we must have an element 
of this singleton that has no element in common with him. But the only possible element 
is A itself , that is to say that we must have: 

An{A} = 0 (5.21) 

But by hypothesis A G A and by construction A G {A}. So: 

A G (A fl {A}) (5.22) 

which contradicts the previous assertion. Therefore: 

A £ A (5.23) 

□ Q.E.D. 

A8. Axiom of replacement (also called "Axiom schema of replacement"): 

This axiom expresses the fact that if a formula / is a functional then for any set A, there 
is a set B consisting precisely of the images of A by this function. 

So, in a little more formally way, the set A of elements a and a binary relation / (which is 
quite generally a functional), there exist a set B consisting of elements b such that /(a, b) 
is true. If / is a function where b is not free then it means that: 

b = /(A) and B = /(A) (5.24) 

In a technical way we write this axiom as following: 

VA 3a G A 3!/ : /(a, b) => 3£>Va G A3b G Bf(a , b) (5.25) 

So for every set A and any item it contains, there is one and only one b defined by the 
functional / such that there exists a set B for which any element a belonging to the set A 
there is a b belonging to set B defined by the functional /. 

Let’s see an example with the following binary predicate that for the value of any a from 
A determines the value of any b of B\ 

P(a , b) = (a = 1 A b = 2) V (a = 3 A b = 4) (5.26) 

Therefore from the knowledge that a is equal 1 we derive that b is equal 2 and similarly 
(i.e. by replacement) when a is equal 3, we derive that b is equal 4. 

We see well through this small example the strong relation that exists considerating the 
predicate P as a naive function! Moreover, as there an infinity of possible functions f, the 
replacement scheme is considered as an infinite number of axioms. 

A9. Axiom of selection (also called "Axiom comprehension schema"): 

This axiom simply expresses that for any set A and any property P expressible in the 
language of set theory, the set of all elements of A satisfying the property P exist. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

So more formally, to any set A and any condition or proposition P(x), there is a set B 
whose elements are exactly the elements x of A for which P(x) is true. This is what we 

B = {x e A : P(x)} (5.27) 

In a more comprehensive and rigorous way we have in fat for any functional / that does 
not include a as free variable: 

\/A3B\/a : aeBoae AA/ (5.28) 

It is typically the axiom that we use to construct the set of even numbers: 

{a E N | 3b e N,a = 2b} (5.29) 

or to prove the existence of the empty set (which invalidates the axiom of the empty set) 
because you just have to ask that there exist a set that satisfies the property: 

A ± A (5.30) 

and regardless of the set A. And only the empty set satisfies this property by the selection 

The compliance with the strict conditions of this axiom eliminates the paradoxes of the 
"naive set theory", as Russell’s paradox or Cantor’s paradox who invalidated the naive set 

For example, consider the Russell set R of all sets that do not contain themselves (note 
that we give a property of R without specifying what is this set): 

R={E:EgE} (5.31) 

The problem is to know whether or not R contains itself or not. If R. £ R., then, R is self- 
contained, and, by definition R ^ R, and vice versa. Each possibility is contradictory. 

If we now denote by C the set of all sets (Cantor Universe) we have in particular: 

P(C) E C (5.32) 

which is impossible (i.e. with the power of the continuum of real numbers), according to 
Cantor’s theorem (see section Numbers). 

These "paradoxes" (or "syntactic antinomies") come from a non-compliance with the 
conditions of application of the selection axiom: to define E (in the example of Russell), 
there must be a proposition P which bears on the set R, which should be explicated. The 
proposal defining the set of Russell or that of Cantor does not indicate what is the set E. 
It is therefore invalid! 

A very nice and well known example (this is why we present it) helps to better understand 
(this is the "Russel paradox" which we have already spoken about int length in the section 
on Proof Theory): 

A young student went one day to his barber. He entered into conversation and asked him 
if he had many competitors in his pretty city. Seemingly innocent way, the barber replied, 
«I have no concurrence. Because of all the men of the city, I obviously do not shave 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

those who shave themselves, but I am fortunate to shave, all those who do not shave 

What then in such a so simple statement could take to the fault the logic of our young 
smart student? 

The answer is in fact innocent, until we decide to apply to the case of the barber: Does he 
shaves himself, Yes or No? 

Suppose he shaves himself: he then belongs to the category of those who shave them- 
selves, those who the barber said he did of course not shave.... So he does not shave 

Finally, this unfortunate barber is in a strange position: if he shaves himself, he does 
not shave himself, and if he does not shave himself, he shaves himself. This logic is 
self-destructive, contradictory stupidly, rationally irrational. 

Then comes the selection axiom: We exclude the barber of all persons to which the 
declaration applies. Because in reality, the problem is that the barber is a member of the 
set of all the men of the city. So what applies to all men does not apply to the individual 
case of the barber. 

A 10. Axiom of choice: 

Given a set A of non-empty mutually disjoint sets, there exist a set B (the set of choices 
for A) containing exactly one element for each member A. 

However let us indicate that the issue of the axiomatization and therefore of the founda- 
tions found himself still shaken by two questions at the time of their construction: what 
valid axioms must be chosen and in a system of axioms are the mathematics coherent (do 
we not have a risk of seeing a contradiction)? 

The first issue was first raised by the continuum hypothesis: if we can put two sets of 
numbers in correspondence term to term, they have the same number of elements (car- 
dinal). We can thus map all integer numbers with rational numbers as we have shown 
in the section on Numbers, so they have the same cardinality, be we can not map integer 
numbers with all the real numbers. The question then is whether there is a set whose 
number of elements would be located between the two or not? This question is important 
to build the classical theory of analysis and mathematicians usually choose to say there is 
none, but we can also say the opposite. 

In fact the continuum hypothesis is linked in a more profound way we could thing to the 
axiom of choice which can also be formulated as follows: if C is a collection of non- 
empty sets then we can select any element of each set of the collection. If C has a finite 
number of elements or a countable number of elements, the axiom seems pretty trivial: 
we can sort and number the sets of C and the selection of an element in each set is simple. 
Where it begin to get complicated is when the set C has the power of the continuum: how 
to choose the elements if it is not possible to number them? 

Finally in 1938 Kurt Godel shows that set theory is consistent without the axiom of choice 
and without the continuum hypothesis as well as with! And to end it all Paul Cohen in 
1963 shows that the axiom of choice and the continuum hypothesis are not related. 

Ok to make a pedagogical summary of all this stuff consider the following figure (excluding the 
axiom of choice): 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

V V 


If two sets have the same 
elements, then they are equal. 

■A* : •/ 


We can form a subset of a set, 
which consists of some elements. 


There is a set with no members, 
written as { } or 0. 

# * 

* ■ 


Given two objects xand y we can 
form a set {x, y). 

# m # 


We can form the union of two or 

more sets. 

w 4 


Given any set, we can form the 
set of all subsets (the power set). 

• ... 


There is a set with infinitely many 


Sets are built up from simpler 
sets, meaning that every (non- 
empty) set has a minimal 



If we apply a function to every 
element in a set, the answer is still 

a set. 

Figure 4.34 - Zermelo-Frankel axioms visual summary (source:?) 

5.1.1 Cardinals 

Definition (#65): Sets are said to be "equipotent" if there exists a bijection (one-one corre- 
spondence) between these sets. We thus say they have same "cardinal" that the norm ISO 3111 
advocated to write card(S) but in this book we will also use the notation Card(S') (many U.S. 
books use non-official notation that looks exactly like the absolute value S' | or #S). 

Thus, more rigorously, a cardinal (which quantifies the number of items in the set) is an equiv- 
alence class (see section Operators) for the relation of equipotence. 


Cantor is the main creator of set theory, in a form that we name today "naive set theory". 
But, apart to elementary considerations, his theory was also consisting of higher abstrac- 
tion levels. The real novelty of the Cantor theory is that it lets talk about infinity. For 
example, an important idea Cantor was precisely to define the "equipotence". 

V / 

If we write ci = c 2 as equality of cardinals, we mean by that there are two equipotent sets A 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

and B such that: 

ci = Card(A) and c 2 = Card(-B) 


Cardinals can be compared. The order thus defined is a total ordering (see section Operators) 
between the Cardinals (the proof that the order relation is complete uses the axiom of choice 
and the proof that it is antisymmetric is known under the name of Cantor-Bernstein’s theorem 
that we will demonstrate later below). 

Say that c\ < c 2 means in simple language that A is equipotent to a proper part of B, but B 
is not equipotent to any own part of A. Mathematicians would say that Card(A) is smaller or 
equal to the Card(E>) if there is an injection of A into B. 

We saw during our study of numbers (see section Numbers), especially of transfinite numbers, 
that an equipotent set (or bijection) to N was told to "countable set". 

Let us now see this notion a little more in detail: 

Let A be a set, if there is an integer n such that there is at least for each element of A a 
corresponding item int the set {1, 2, ..., n} (in fact this is rigorously a bijection... concept that 
we will define later) then we say that the cardinal of A, denoted Card(A) or Card(A) is a "finite 
cardinal" and its value is n. 

Otherwise, we say that the set A has an "infinite cardinal" and we write: 

Card(A) = +oo (5.34) 

A set A is "countable" if there is a bijection between A and (. N ). A set of numbers A is 
"countable" if there is a bijection between A and part of (. N ). A set at maximum countable is 
thus of finite cardinal, or countable. 

We can therefore check the following proposals: 

PI . A part of a countable set is at most countable. 

P2. A set containing a non-countable set is also not countable. 

P3. The product of two countable sets is countable. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

So any infinite subset of N is equipotent to N itself, what may seem counter-intuitive at first...! 

In particular, there are as many even integers as any natural integer numbers (use the bijection 
f(n) = 2 n) from N to P, where P is the set of even natural numbers. As many relative numbers 
as integers, as many integers as rational numbers (see the section on Numbers for the proofs). 

Thus we can write: 

Card(N) = Card(Z) = Card(Q) = K 0 (5-36) 

and more generally, any infinite part of Q is countable. 

Thus we have an important result: any infinite set therefore has an infinite countable part. 

Since we have proved in the section on Numbers that the set of real numbers has the "power 
of the continuum" and that the set of natural numbers has transfinite cardinal K 0 , Cantor raised 
the question whether there was a cardinal between the transfinite cardinal K 0 and the cardinal of 
M? In other words, we have an infinite amount of integers, and an even greater amount of real 
numbers. So does it exist an infinite greater than the infinite of integers and smaller than that of 
the real numbers? 

The problem arose by writing K 0 the cardinal of N and K , (new) the cardinal of M and offering 
to demonstrate or contradict that: 

K, = 2*° (5.37) 

according to the combinatorial law that gives the number of elements that we can get from from 
all subsets of a set (as we have proved it before). 

The rest of his life, Cantor tried, in vain, to prove this result that we name the "continuum hy- 
pothesis". He did not succeed and descended into madness. In 1900, the International Congress 
of Mathematicians, Hilbert considered that this was one of the 23 major issues that should be 
resolved in the 20th century. 

This problem is solved in a rather surprising way. First, in 1938, one of the greatest logicians of 
the 20th century, Kurt Godel showed that the hypothesis of Cantor was not rebuttable, that is to 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

say, we could never prove that it was false. Then in 1963, the mathematician Paul Cohen closed 
the debate. He demonstrated that we could never prove that it was true!!! We can conclude 
rightly that Cantor had become mad to try to demonstrate a problem that could not be proved. 

5.1.2 Cartesian Product 

If E and F are two sets, we name "Cartesian product of E by F" the set noted E x F (not to 
be confused with the vector product notation) consisting of all possible pairs (e, /) where e is 
an element of E and / an element of F. 

More formally: 

ExF = {(e,f)\eeEAf eF}} (5.38) 

We note the Cartesian product of E by itself: 

E x E = E 2 (5.39) 

and then we say that E 2 is the "set of pairs of elements of E". 

We can perform the Cartesian product of a sequence E x x E- 2 x ... x E n of sets and get all 
n-tuples (ei, e 2 , ..., e n ) where e x G E x , e 2 G E 2 , ..., e n G E n . 

In the case where all sets E t are identical to E, the Cartesian product is obviously noted E" . 
We then say that e n is the "set of all n-tuples of elements of E" . 

If E and F are finite then the Cartesian product E x F is finished. Moreover: 

Card(£ x F) = Card(E) ■ Card(F) (5.40) 

From here we see that if the sets E x , E 2 , ..., E n are finished then the Cartesian product E\ x 
E -2 x ... x E n is finished and we have: 


Card(£’i x E 2 X ... x E n ) = Card(^) (5.41) 

i = 1 

In particular: 

Card (E n ) = [Card(£)] n 


if E is a finite set. 


El . If M is the set of real numbers, then M 2 is the set of all couples of real numbers. In the 
plane reported to a referential, any point M has the coordinates that are an element of M 2 . 

E2. When we run two dice whose faces are numbered 1 through 6, each die can be 
symbolized by by the set E = {1,2, 3, 4, 5, 6}. The outcome of a roll of dies is then an 
element of E 2 = Ex E. The Cardinal of E x E is then 36. There is therefore 36 possible 
results when we launch two dices whose faces are numbered 1 to 6. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


Set theory and the concept of cardinal is the theoretical basis of relational database soft- 

V / 

5.1.3 Intervals 

Let M be a set of any numbers so that M cK (particular but frequent example). We have for 

Dl. x G R is called "upper bound" of the set M, if x > m for Vm G M. Conversely, we 
speak about "lower bound" (so do not confuse the concept of terminal with the concept 
of interval!). 

D2. Either McR,M/0. reRis called the "smallest upper bound" noted: 

x = sup M (5.43) 

of M if x is an upper bound of M and if for any upper bound y e Iwc have x < y. 
Conversely, we speak about the "smaller lower bound" that we note: 

x = inf M (5.44) 

The definitions are equivalent in the context of functional analysis (see section of the same 
name) as the functions are defined on sets. 

Indeed, let / be a function whose domain of definition / swept all M. We note that: 

f :E^R (5.45) 

and let x 0 G M. 

Definitions (#66): 

Dl. We say that / has a "global maximum" on x 0 if: 

VxeE: f(x) < f{x 0 ) (5.46) 

D2. We say that / has a "global minimum" on x 0 if: 

Vx E E : fix) > fix o) (5.47) 

In each of these cases, we say that / has an "global extremum" on (it is a concept that we 
often use in the sections of Analytical Mechanics and Numerical Methods!). 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Figure 4.35 - Global/Local Maximum and Minimum example (source: Wikipedia) 

D3. /is "upper bounded" if there is a real number M such as \/x G /, f{x ) < M. In this case, 
the function has an upper bound of / on its domain of definition / traditionally denoted: 

sup / (5.48) 


D4. / is "lower bounded" if there is a real M such that \/x G I, fix) > M . In this case, the 
function has a lower bound of / on its domain of definition / traditionally denoted: 

inf / (5-49) 

D5. We say that / is "bounded" if it is both lower bounded and upper bounded (typically the 
case of trigonometric functions). 

5.2 Set Operations 

We can build from at least three sets A, B,C all sets operations (which notations are due to 
Dedekind) existing in set theory (very useful in the study of probability and statistics). 


Some of the notations below will be frequently use later in relatively complex theorems, 
so it is necessary to understand them deeply! 

V I / 

Thus, we can construct the following set operations: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

5.2.1 Inclusion 

In the simplest case, we define the "inclusion" as: 

A c B Vx \x e A 




In a non-specialized language here’s what to you have to read: A is "included" (is a "part", or is 
a "subset") in B then for all x belonging to each of these x also belongs to B : 


Figure 4.36 - Visual example (Euler Diagram) of the inclusion 

where the U in the lower right comer of the figure represents the Cantor Universe. 
From this it follows the following properties: 

PI. If A G B and B e A then it implies A = B and vice versa. 
P2. If A e B and B e C then implies AeC, 

5.2.2 Intersection 

In the simplest case, we define the "intersection" as: 

AnB = {x\xeAAxeB} 


In a non-specialized language here’s what you have to read: the "intersection" of sets A and B 
consists of all the elements that are both in A and in B: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


More generally, if (A*) is a family of sets indexed by i e /, the intersection of the (A*), i G / is 

H4; (5.52) 


This intersection is explicitly defined by: 

P| Ai = {# | V7 e /x e A*} (5.53) 


That is to say the intersection of the family of indexed sets includes all x that are located in each 
set of all sets of the family. 

Given two sets A and B, we say they are "disjoint" if and only if: 

An 5 = 0 (5.54) 

Furthermore, if: 

A n B = 0 <=> Card(A U B) = Card(A) + Card(5) (5.55) 

Mathematicians note that: 

A U 5 


and name it "disjoint union". 

We sometimes joke that knowledge is built on the disjunction... (those who understand will 

Definition (#67): An collection S — S t of non-empty sets form a "partition" of a set A if the 
following properties hold: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

PI. VS), Sj G S and i^j => Si H Sj — 0 
P2. A = U Si 



The set of even numbers and the set of odd numbers are a partitions of Z. 

The intersection law is trivially a commutative law (see further below the definition of the 
concept of "law") as: 

AnB = BnA 


5.2.3 Union 

In the simplest case, we define the "union" (also sometimes named "merge") as: 

Au B = {x \ x e A\/ x e B} 


In a non- specialized language here’s what you have to read: the "union" (or "merge") of the sets 
A and B is the set of elements that are in A plus those that are in B. 


More generally, if (Aj) is a family of sets indexed by % e I, the union of the i <E / is 

U Ai (5.59) 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

This union is explicitly defined by: 

|^J Ai = {x \3i e lx e Ai} (5.60) 


That is to say that the union of the family of indexed sets includes all x for which there is a set 
indexed by i such that x is included in on of the set 

We have the following distributive properties: 

U a) n b 

i£l ) 

(n - 4 <) u b 

The law of union U is a commutative law (se 
"law") as: 

U (A n B) 



n (a u b ) 



further below the definition of the concept of 


We also name "idempotences laws" the relations (note that for the general culture): 

AnA = A 
Au A = A 

and "absorptions laws" the relations: 

A n (A U B) = A 
A U (A fl B) = A 

The laws of intersection and union are associative, such that: 

An(BnC) = (AnB)nC 
A U (B U C) = {A U B) U C 

and distributive such that: 

A n (B u C) = (A n B) u (A n c) 

A u (B n C) = (A u B) n (A u C) 










If we recall the concept of "cardinal" (see above) we have with the previously defined opera- 
tions, the following relation: 

Card(A U B) — Card(A) + Card(-B) — Card(A fl B) (5.72) 

5.2.4 Difference 

In the simplest case, we define the "difference" as: 

A\B = {x \ x E A /\x B} 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

In a non-specialized language here’s what you have to read: The "difference" of the sets A and 
B consists of all the elements found only in A (and thus excluding those of B ): 


5.2.5 Symmetric Difference 

Let U be a set. For any equation we define the "symmetric difference" A5B between A and B 

AAB = (A\B)U(B \ A) 


In a non-specialized language here’s what you have to read: The "symmetric difference" of the 
sets A and B consists of all items that are only in A and those found only in B (we pass aside 
elements that are common): 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

So as we can see we have: 

AAB = (AUB)\ (An B) 

Some trivial properties are given below: 

PI. Commutativity: A A B = B A A 

P2. Complementarity (see definition below): A c A B c = A A B 

5.2.6 Product 

In the simplest case, we define the "set product" or "cartesian product" as: 

A x B = {(x, y) | x G A A y e B} 



In a non-specialized language here’s what to you have to read: "product" (not to be confused 
with the multiplication or cross product of vectors) of two sets A and B is the set of pairs such 
as each element of each set is combined with all elements of the other set. 

The product set of real numbers for example generates the plane where each element is defined 
by X and Y axis. 

We often find products sets in mathematics and physics when we work with functions. For 
example, a function of two real variables which gives real output will be written: 

f(x,y)^z (5.77) 

RxR — s-R 

or more simply: 

f( x i y) = z 


5.2.7 Complementarity 

In the simplest case, we define the "complementarity" as: 

VA c U A = {x | x e U, x £ A} 


In a non-specialized language here’s what you have to read: The "complementary" is defined 
as taking a set U and a subset AofU then the complement of A in U is the set of elements that 
are in U but not in A: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

A u 

Figure 4.41 - Visual example (Euler diagram) of the difference 

Other notations of complementarity that is sometimes found in the literature and the following 
book are (depending on the context to avoid confusion with other stuff): 

uA 0 or A c (5.80) 

or in the particular example above, we could also just write B \A. 

We have for properties for all A* included in any B\ 

IM<) =fK 

i£l / i&I 


iei / iei 

Here are some trivial properties regarding to complementarity: 

A = A (5.83) 

AnA = 0 (5.84) 

AuA — U (5.85) 

There are other very important relations that also applied to Boolean logic (see section Logic 
Systems). If we consider three sets A, B,C as shown below we have: 

A\(BnC) = (A\B)U(A\C) (5.86) 

A\B\C= {A\B)U(AnB) (5.87) 



and the famous "De Morgan’s laws" in set form (see section Logic Systems), which are given 
by the relations: 

An B = AU B 
Aub = AnB 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

We would like indicate before moving on to another topic, that a significant number of adults 
in employment (mostly managers) having forgotten the previous defined concepts after leaving 
high school must study them again when they leam the SQL language (Structured Query Lan- 
guage) which is the most common worldwide language to query corporate databases servers in 
the 20th and 21st century. Most of them learn in training centers the following scheme to build 
queries with joins: 

SELECT <«rkci l<«> 


ON A.Ke> = B.Ko 


SELECT <Kkft l»*t> 
ON A. Key = B.Kcy 

SRLECT <tckct_lbO 
i IOM Ml V V 
ON A. Key = B.Kry 

SliIJiCT <*clect_lm> 
FROM TublrA A 
ON A. Key « B.Kcy 

SELECT <nckct li*t* 

FROM T-blcA A 
ON A.Kcv * B.Ko 

•Cl Mc4T«n. 30Dt 

SELECT <acIcci U*i> 

I ROM Mil \ \ 

ON A. Key - B.Kcy 

SIULCT <*ckci I»ai> 

ON A. Key = B.Kcy 

Figure 4.42 - Common SQL query expressions with joins 

5.3 Functions and Applications 

Definition (#68): In mathematics, an "application" (or "function") denoted typically / - in 
analysis - or A - in linear algebra - is the information of two sets, the departure set E and arrival 
set F (or "image of E"), and a relation associating each element x of the departure set one and 
only one element of the arrival set, which we call "image of x by /" in the analysis field we 
note that f{x) or f(E) to explicit the departure set. We name "images" the elements of f(E) 
and the elements of E are called the "antecedents". 

Then we say that / is an application from E to F denoted: 

/ :E^F (5.89) 

(remember the first arrow/sagittal diagram presented at the beginning of this section), or we also 
say that this is an application of arguments in E and values in F. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


Note: The term "function" is often used for applications with scalar numeric values, real 
or complex, that is to say when the arrival set is M. or C. We speak then of "real function" 
or "complex function". In the case of vector we prefer to use the word "application" as 
we already mention it in the definition. 

V / 

Definitions (#69): 

Dl. The "graph" or "plot" (or also called "graphic" or "representative") of an application or 
function f : E F is the subset of the cartesian product E x F consisting of pairs 
for x varying in E. The data of the graph / determines its starting set (by 
projection on the first argument often denoted x) and image (projection on the second 
argument often denoted y). 

D2. If the triplet f(E, F, T) is a function where E and F are two sets and T C (E x F) is a 

graph, E and F are the source and purpose of / respectively. The "definition domain" or 

"departure set" of / is: 

Df = I = {x e E I 3y e F, (x, y) G T} (5.90) 

D3. Given three non empty sets E, F, G, any function of E x F to G is named a "composition 

law" of E x F with values in G. 

D4. An "internal composition law" (or simply "internal law") in E is a composition law of 
E x E with values in E (that is to say this is the case E = F = G). 


The subtraction in N is not an internal composition law although it is part of the four 
basic high-school arithmetic operators. But the addition in N is such an internal 

V / 

D5. An "external composition law" (or simply "external law") in E is a composition law of 
F x E with values in E, where F is a separate set of E. In general, F is a set, called 
"scalar set". 


In the case of a vector space (see definition much lower) the multiplication 
of a vector (whose components are based on a given set) by a real scalar is an 
example of external composition law. 


An external composition law with values in E is also called "action of F on E". 
The set F is then the field operators. They also say that F operates on E (keep in 
mind the example of the vectors mentioned above). 

V W 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

D5. We name "image of /", and note Im(f), the subset defined by: 

Im (/) = f(E ) = {yeE\3xeE,y = f(x)} 



Thus, "the image" of a function / : E H * F is the collection of f(x) for x browsing E. It 
is a subset of F. 

And we name "kerr of /", and we note ker(/), the very important subset in mathematics 
defined by: 

ker (/) = /({ 0}) = {x G E \ f(x) = 0} 


According to the figure (you must deeply understand this concept because we will reuse 
the ker many times to prove theorems that have important practical applications later in 
various chapters): 

lm(/) image of / 

Figure 4.43 - ker concept of a function 


Rl. ker(/) is derived from the German "Kern", simply meaning "kernel". 

R2. Normally the notations Im and ker are reserved for group homomorphisms, 
rings, fields and to linear applications between vector spaces and modules, etc. (see 
further below). We do not usually use them for any applications between any sets. 
But ... it does not really matter for the moment at this level of the book. 

V I W 

Applications and functions can have a phenomenal amount of properties. Below you can found 
some easy one that are part of the general knowledge of the physicist (for more information 
about what a function is, see the section on Functional Analysis). 

Let / be an application or function of a set E to a set F then we have the following properties: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

PI. An application or function is said to be "surjective" if: 

Any element y of F is the image by / of at least (we emphasize on the "at least") 
an element of E. We thus say that it is a "surjection" from E to F. It follows from 
this definition, that an application or function / : E — > F is surjective if and only if 
F = Im /. In other words, we also write this definition as following: 

My e F, 3x G E : y = f(x) (5.93) 


Figure 4.44 - Schematic representation of a surjective application or function 

P2. An application or function is said to be "injective" if: 

Any element y of F is the image by / of at most / (we emphasize the "at most") 
a single element of E. We thus say that / is an injection of E to F. It follows from 
this definition, that an application or function / : E — > F is injective if and only if 
the relations x\,x 2 G E and f(x\) = f(x 2 ) involve. In other words: an application or 
function for which two separate elements have distinct images is called injective. Or an 
application or function is injective at least if one of the following equivalent properties 

P2.1 Mx, y G E 2 : f(x) = f(y) ^x = y 
P2.2 Vx, y : x^y^ f(x) ± f(y ) 

P2.3 My e F the equation in x,y — f(x) has at least one solution in E 
All this can be resumed by: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


Figure 4.45 - Schematic representation of an injective application or function 

P3. An application or function is said to be "bijective" or "total application/function" if: 

An application or function / from E to F is both injective and surjective. In this 
case, we have that for any element y of F, the equation y = f(x) admits in A a single 
(not "at least" or not "at most") pre-image x. What we also write: 

\/y G F, 3!x G E : y — f(x) (5.94) 

This is illustrated by: 

^ E F 

Figure 4.46 - Schematic representation of a bijective application or function 

We are thus naturally led to define a new application from F to E, called "inverse func- 
tion" or "reciprocal function" of / and noted / ~~ 1 that to every element of F matches 
the unique pre-image element x of E (also called sometimes "solution") of the equation 
y = f(x). In other words: 

x = f~\y) (5.95) 

The existence of an inverse (reciprocal) function or application implies that the graph of 
a bijective function or application (in the set of real numbers...) and that of its inverse 
(reciprocal) are symmetric with respect to the right of equation y = x. 

Indeed, we notice that if y = f(x) is equivalent to x = / _1 (t/), then these equations 
imply that the point (x, y) is on the graph of / if and only if the point (y, x) is the graph 
of equation / -1 . 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

y = f(x) 

Figure 4.47 - Bijective function example 

As you can see for example in the figure below with the sinus function (see section 


Take the case of a holiday station where a group of tourists must be housed 
in a hotel. Each way to allocate these tourists in hotel rooms may be represented 
by an application of all tourists to all the rooms (to each tourist is assigned a 

• Tourists want the application to be injective, that is to say, each of them has a 
single room. This is only possible if the number of tourists does not exceed 
the number of rooms. 

• The hotel manager hopes that the application is surjective, that is to say, each 
room is occupied. This is only possible if there are at least as many tourists 
than rooms. 

• If it is possible to spread the tourists so that there is only one per room, and 
all the rooms are occupied: the application will then be both injective and 
surjective that is to say bijective. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


Rl. It comes from the definitions above that / is bijective in the set of real 
numbers if and only if any horizontal line intersects the graph of the function at a 
single point. This leads us to the second following remark: 

R2. An application that satisfies the test of the horizontal line is continuously 
increasing or decreasing at any point in its domain. 

\ I " / 

P4. An application or function is named "composite application" or "composite function" if: 

Let p be an application or function from E to F and 0 an application or function of F in G. 
The application or function that associates to each element x of the set E an element ip(p(x)) 
of G is named "composed application" of p and 0 and is denoted by: 

0 o ip (5.96) 

where the symbol "o" is called "round" (do not confused with the scalar product we will see 
later in the section of Vectorial Calculus). Thus, the above relation is written "psi round phi" 
but has to be read "phi round psi" (...). So: 

(0 O p)(x) = 0(<Jfe)j (5.97) 

Let, moreover, x be an application (not a function!) of G in H. We check immediately that the 
composition operation is associative for applications (for more details see the section of Linear 

yE(0°0 = (x°0)o (p (5.98) 

This allows us to omit parentheses and write more simply: 

XO'ijjotp (5.99) 

In the particular case where p would be an application or function from E to E, we note p k the 
composed application p o tp o ... o p (k times). 

What’s important in what we have seen until now in this section is that all defined properties 
listed above are applicable to Numbers’ Sets. 

Let us see a concrete and very powerful example: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

5.3.1 Cantor-Bernstein Theorem 

Warning! This theorem, for which the result may seem trivial, is not necessarily easy to ap- 
proach (its mathematical formalism is not very aesthetic...)- We advise you to read the proof 
slowly and imagine the sagittal diagrams in your head during the development. 

Here is the hypothesis to prove: 

Theorem 4.27. Let X and Y be two sets. If there is an injection (remember the definition of 
an injective function or application above) from X to Y and another from Y to A", then both 
sets are in bijection (remember the definition of an bijective function or application above). It 
is therefore an antisymmetric relation. 

This is illustrated by: 

Figure 4.48 - Representation of a antisymmetric relation 

For the proof we need rigorously to demonstrate beforehand a lemma (intuitively obvious... but 
not formally) who’s statement is as follows: 

Lemma 4.27.1. Let X, Y, Z three sets such that X C Z C Y. If X and Y are in bijection 
through a function /, then X and Z are in bijection through a function g. 

An example of application of this lemma is the set of natural numbers and rational numbers 
which are in bijection (see the section of Number Theory for the proof). Therefore, all the 
rational numbers are in bijection with the set of natural numbers since N C Z C Q. 

Proof 4.27.1. First, formally, we create a function / from Y to X such that it is bijective: 

f :Y^X (5.100) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

To continue we need a set A that will be defined by the union of the images of the functions of 
the functions / (of the kind /(/(/...)))) of the pre-images of the set Z (remember that Z C Y) 
which we exclude elements of X (that we will be noted for this proof: Z — A"). 

In other words (if the first form is not clear...) we define the set A as the union of images of 
(Z — X) by the applications / o / o ... o /. What we write: 


A={J f n (Z-X) (5.101) 


Because / : Y — >• A" and that (Z — A") C Y we have by construction A C X and thus 
((Z — A) U A) C Z. Note that we also have: 

OO / OO \ OO OO 

a = u n z - x ) =* ha) = / u n z - x ) = u / (. n z - x )) = u f n+ \ z - x ) 

71=1 \ 71=1 / 71=1 71=1 


and by reindexing: 


f(A)= U f n (Z-X) (5.103) 


We then have (make a pattern in your head of the arrow diagrams can help at that level of the 

/((Z — X) U A) = A (5.104) 

We can elegantly demonstrate this last relation: 


f((Z - X) U A) = f(Z - X) U f(A) = f(Z - X) U u nz - X) = IJ f n {z - X) 

71=2 71=1 


Since Z can be partitioned (nothing stop us to do this!) in two disjoint subsets (Z — A") U A and 
(A" — A) and without forgetting that X C Z C Y and AC A, we set as definition the function 
g (we don’t give more information about it yet) such that: 

5 : Z 4 A (5.106) 

and for every pre-image a of g of the partition ((Z — X) U A) C Z we have: 

Vae((Z-A)UA)^/(a) (5.107) 

This means that because ((Z— X)UA) C Z and Z C Y we can thus apply the bijective function 
/ (remember that / : Y — > A") as equivalent of the function g to any element of ((Z — A") U A). 

We also have also for every pre-image a of g of the partition (A — A) (remember that AC A): 

Vae(X-A)h> 0 (5.108) 

The application g is then bijective because its restrictions to the ((Z — A") U A) and (A — A) 
are / and the identity which are bijective by definition. 

Finally there exists, by construction, a bijection between A and Z. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

□ Q.E.D. 

Now that we have proved the Lemma let us recall the assumptions of the Cantor-Bemstein 
theorem using the result of the Lemma: 

Consider (p an injection from X to Y and -0 an injection for Y to X with A" C Y. 

We thus have: 

<p( X) C Y and ^{Y) C X (5.109) 

so (we recognize here the statement of the lemma): 

i/j((p(x)) c -0(y) c x (5.110) 

a o (6 * c) = (a o b) * (a o c) 
(a *b) o c = (a ob) * (b o c) 

(M, *) est un magma si = 

* est une operation 

* est une loi interne 

x * a = x * b 

a = 


(M, *) est un monoide si 

* est associative 

3 un element neutre n G M pour * 

a + b = c 

a b a-\-b 

£l + £l = £l 

i = 1 i = 1 i=l 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

(a b \ / a+b \ 

Xi + X 1 ) + c= X 1 ) + 

U=1 i= 1 ) \i=l ) 

= o 

= — c 

a + b = —c 

a' ■ a — n — 1 

a' = - 

( G , *) est un groupe 
* est associative 
3 6 Gun element neutre pour * 
VieG possede un symetrique 
pour * 

(p, q) eZ/q = o 

C P/q)/r = p/{q/r ) 

a b 
b a 
= ±1 

a = ±b 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

(a + ib ) + (c + id) 

= a + ib + c + id 
= a + c + ib + id 
— (a -\- c) A i(b A c/) 

(a + ib) — (c + id) 

= a + ib — c — id 
= a — c + ib — id 
= (a — c) + i{b — d) 

(a + ib)(c + id) 

= ac + aid + ibc + ibid 
= ac + iad + ibc + i 2 bd 
= ac + iad + ibc + (— 1)M 
= (ac — bd) + i(ad + be) 

1 _ (s - iy) 

x + iy (x + iy) ( x — iy) 
x — iy 
x 2 — (iy) 2 
x — iy 

x 2 + y 2 

x _ y 

x 2 + y 2 x 2 + y 2 

(A, +, *) est un anneau si •<== 

(A, +) est un groupe abelien 
La loi * est associative 


La loi * est distributive par 
rapport a la loi + 

(Z, +) 

(C, +) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

(Z, +, x) 
(Q, +, x) 
(M, +, x) 
(C, +, x) 

(67, +, *) est un corps si 

(67, +, *) est un anneau unitaire 
(67 — {0}, *) est un groupe 

(Z, +, x) 
(Q, +, x) 
(M, +, x) 
(C, +, x) 

(M, +, x) 
(C, +, x) 

i, +) groupe commutatif < 

L’ addition est une operation interne 
L’ addition est associative 
L’ addition est commutative 
II existe un element neutre pour 1’ addition : 0 
Tout nombre reel possede un oppose 
La multiplication est une operation interne 
La multiplication est associative 
La multiplication est commutative 
II existe un element neutre pour T addition : 1 
Tout nombre reel non nul possede un inverse 
La multiplication est distributive par rapport a T addition 

l, x) groupe commutatif < 

! La relation < est reflexive 
La relation < est antisymetrique 
La relation < est transitive 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


( E , +) est un groupe abelien 
* est une loi externe definie par 

* : E x K — >■ E 
(x, a) — >■ a * x 

x = t 2 + 2t + 3 
y = -t + 5 

F ^ 0 

v(x, y) g f 2 , x + y g f 


F ^ 0 

gf 2 ,^ + ^gf 

VA G K , V af G F, Aaf G F 

(A, F, +, •, x) est une F-algebre =>■ 

(A, +, -)est un C-espace vectoriel 
(A, +, x)est un anneau unitaire 

VA G C, Va, b G A, 

(Aa) ■ b = a ■ (A 6) = A(a ■ 6) 

f{a*b) = f{a)o f(b) Va, 6 G A 

f{a*b) = f{a)o f(b) Wa,b e A 

/(1a) = Is 

/(1a) = 1b 

/(a + a') = /(a) + /(a 7 ) 
/(a • a') = /(a)/(a) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

ker (/) = {0} 

f{x + y)= f(x)*f{y) Wx,yeA 
/( 1 a ) = 1 B 

f(x _1 ) = (/(+>) _1 Vx e A 

VxeA: 1 B = f(l A ) 

= /(x + x -1 ) 

= f(x) * /(x _1 ) 

Vx e A : 1 B = f(x) * f(x x ) 
Vx 1 A : fix) = f 1 '; 

= ( fix - 1 ))- 1 

1 = /(l) = f(a-b) = /(a) ■ f(b ) 

fix + y) = f(x) + f(y)W(x, y) e A 
/(Ax) = A/(x) Vx G 4,VA 6 K 

f(a + a') = /(a) + /(a) 
= 0 + 0=0 

f(ra ) = f(r)f(a) = f(r) -0 = 0 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

{ax \x E A } 

a = qr + r' 

I = rZ 

m ■ a = d ■ n ■ a E dZ 

/(") = /( 1 + - + 1 ) 



— 1a + ••• + 1a) 

S v ✓ 


= n ■ 1 A 

f(n ) = n ■ 1 A 

(1) o) • (0, 1) = 0 


16 votes, 62.5% 


info @ sciences. ch 

*0 • 


Robability is the measure of the likelihood that an event will occur and therefore the 
calculation of probabilities handles random phenomena (known more aesthetically 
as "stochastic processes" when their are time-dependent), that is to say, phenomena 
that do not always lead to the same outcome and that can be studied using numbers 
their implications and occurrences. However, even if these phenomena have variable outcomes, 
depending on chance, we observe a certain statistical regularity. 

Probability is quantified as a number between 0 and 1 (where 0 indicates impossibility and 
1 indicates certainty). The higher the probability of an event, the more certain we are that the 
event will occur. 

The concepts related to probabilities have been given an axiomatic mathematical formaliza- 
tion in probability theory (see further below), which is used widely in such areas of study 
as mathematics, statistics, finance, gambling, science (in particular physics), artificial intelli- 
gence/machine learning, computer science, game theory, and philosophy to, for example, draw 
inferences about the expected frequency of events. Probability theory is also used to describe 
the underlying mechanics and regularities of complex systems. 

Definitions (#70): There are several ways to define a probability. Mainly we are talking about: 

Dl. "Experimental or inductive probability" which is the probability derived from the whole 

D2. "Theoretical or deductive probability" which is the known probability through the study of 
the underlying phenomenon without experimentation. It is therefore an "a priori" knowl- 
edge as opposed to the previous definition that was rather referring to a notion of "a 
posteriori" probability. 

As it is not always possible to determine a priori probabilities, we are often asked to perform 
experiments. We must then be able to pass from the first to the second solution. This passage is 
supposed to be possible in terms of limit (with a population sample whose size approaches the 
size of the whole population). 

The formal modeling of the probability calculus was invented by A.N: Kolmogorov in a book 
published in 1933. This model is based on the probability space (U, A, P) that we will define a 
little further and that we can relate to the theory of measurement (see section Measure Theory). 
However, the probabilities were studied in the scientific point of view by Fermat and Pascal in 
the mid 17th century. 

4. Arithmetic 

EAME v3. 5-2013 


If you have a teacher or trainer who dare to teach statistics and probabilities with exam- 
ples based on gambling (cards, dice, match, toss, etc.) dispose it to whom it may concern 
because it would mean that he has no experience in the field and he will teach you any- 
thing and no matter how (examples could normally be based on industry, economy or 

R&D, in short: areas daily used in companies but especially not on gambling ...!). 

V, " / 

6.1 Event Universe 

Definitions (#71): 

Dl. The "universe of events", or "universe of observables", U is the set of all possible out- 
comes (results), called "elementary events" that occur during a random determined test. 
The universe can be finite (countable) if the elementary events are finite or continuous 
(uncountable) if they are infinite. 

D2. Any "event" A is a set of elementary events and is part of the universe of possible U. It is 
possible that an event is composed of only a single elementary event. 


Consider the universe of all possible blood groups, then the event A "the individ- 
ual is Rh positive" is represented by: 

A = {A+, -B+, AB-\~, 0+} C U 

while the event B "the individual is the universal donor" is represented by: 

B = { O -} C U 

thus being an elementary event. 

D3. Let U be a universe and A an event, we say that the event A "occurs" (or "is realized") if 
during the run of the trial the issue i (i E U) occurs and that i 6 A . Otherwise, we say 
that A "was not realised". 

D4. The empty subset 0 of U is called "impossible event". Indeed, if during a trial where the 
event i occurs, we always have i 6 0 and the event 0 then never occurred. 

If U is finite, or countably infinite, any subset of U is an event, that is no longer 
true if U is uncountable (we will see in the chapter Statistics why). 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

D5. The set U is also called "certain event". Indeed, if at the end of the trial the event i occurs, 
we have always (since U is the universe of events). The event U then always occurred. 

D6. Let A and B be two subsets of U. We know that the events A U B and /l fl B arc both 
subsets of U then, events that are respectively "joint events" and "disjoint events". 

If two events A and B are such that: 

A n B = 0 

( 6 . 1 ) 

the two events may not be feasible during the same trial, then we say that they are "mutually 
exclusive events". 

Otherwise, if: 

AnB ^ 0 


the two events may be feasible during the same trial (the possibility to see a black cat when we 
pass under a ladder, for example), we say conversely that they are "independent events". 

6.2 Kolmogorov’s Axioms 

The probability of an event is somehow responding to the notion of frequency of a random 
phenomena, in other words, at each event we will attach a real number in the interval [0, 1], 
which measure the probability (chance) of realization. The properties of frequencies we can 
highlight during various trials allow us to determine the properties of probabilities. 

Let U be a universe. We say that we define a probability on the events of U if any event AofU 
we associate a number or measure P{A), called "a priori probability of event A" or "marginal 
probability of A". 

Al. Fon any event A: 

1 > P{A) > 0 


Thus, the probability of any event is a real number between 0 and 1 inclusive (this is common 
human sense...). 

A2. The probability of the certain event or of the set (sum) of possible events is equal to 1: 

P(U) = 1 


A3. If A fl B = 0 two events are incompatible (disjoint), then: 

'P(AUB) = P(A) + P(B) 

< > 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

the probability of the merge ("or") of two mutually incompatibles events (or mutually exclusive) 
is therefore equal to the sum of their probabilities (law of addition). We then speak of "disjoint 

We understand better that the third axiom requires A fl B = 0 otherwise the sum of all proba- 
bilities could be greater than 1 (imagine again the set diagram of the two events in your head!). 


Consider that in a given area, over 50 years, the probability of a major earthquake 
is 5% and on the same period the probability a major flood is 10%. We would like to 
know what is the probability that a nuclear plant meets at most one of two events during 
the same period if they are incompatible.... We have then the total probability that is the 
sum of the two probabilities which is 15%... 

We will find an example of this kind of disjoint probability in the chapter of Industrial Engi- 
neering when studying F.M.E.A. (Failure Modes and Effects Analysis) for fault analysis systems 
with a complex structure. 

In other words in a more general form if (A) ieN is a sequence of pairwise disjoint events (A* 
and Aj can not occur at the same time though i ^ j) then: 


We then speak of "a- additivity" because if we look more closely at the three axioms above the 
measure P forms a cr-algebra (see section Measure Theory). 

At the opposite, if the events are not incompatibles (they can overlap or in other words: they 
have a joint probability), we then have for probability that at most one of the two takes place: 

P(AuB) = P(A) + P(B)-P(AnB) (6.7) 

This means that the probability that at most one of the events A or B occurs is equal to the sum 
of the probabilities for the realization of A or B occurred, minus the probability that A and B 
occurred simultaneously (we will show later that this is simply equal to the probability that the 
two do not occur at the same time!). 


Consider that in a given area, over 50 years, the probability of a major earthquake 
is 5% and on the same period the probability a major flood is 10%... We would like to 
know what is the probability that a nuclear plant meets at most one of two events during 
the same period if the are not incompatibles. We then calculate the probability that from 
the above equation that gives 14.5%... 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

And thus if they were incompatibles we would have then A n B = 0 we find again the disjoint 

P(AUB) = P(A) + P(B) 

( 6 . 8 ) 

An immediate consequence of the axioms (A2) and (A3) is the relations between the probability 
of an event A and its complement, noted A (or more rarely in accordance with the notation used 
in the chapter of Proof Theory the complementary may be noted ->A): 

P(A ) = 1 - P(A) (6.9) 

Let U be a universe with a finite number of n possible outcomes: 

U ■••An} 


where the events: 

h = { a } ,h = {k},h = {* 3 } , -,In = {in} ( 6 . 11 ) 

are called "elementary events". When these events have the same probability, we say they are 
"equiprobables". In this case, it is very easy to calculate the probability. Indeed, these events 
being by definition incompatible with each other at this level of our discussion, we have under 
the third axiom (A3) of probabilities: 

P (h U I 2 U ... U A) = p(h) + P(I 2 ) + ... + P(I n ) (6.12) 

but since: 

P(J 1 UJ 2 U...U/ n ) = P{U) = 1 


and that the probability of the right hand are by hypothesis equiprobable, we have: 

P(h) = P(I 2 ) = ... = P(I n ) = - (6.14) 


Definition (#72): If A and B are not mutually exclusive but independent, we know that by their 
compatibility A n B = 0, that (very important in statistics!): 

P(A n B) = P(A) ■ P(B) (6.15) 

the probability of the intersection ("and" operator) of two independent events is equal to the 
product of their probabilities (law of multiplication). We name it "joint probability" (this is the 
most common case). 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


Consider that in a given area, over 50 years, the probability of a major earthquake 
is 5% and on the same period the probability a major flood is 10%. .. Assume that these 
two events are not mutually exclusive. In other words that they are compatible. We will 
be interested to their independence. Thus, we would like to know what is the probability 
that a nuclear power plants meets the two events at the same time, at any time, during 
this same period. We then calculate the probability from the above equation that gives 

Under a more general form, the events A 1; A 2 , A n are independent if the probability of the 
intersection is the product of the probabilities: 

( n \ n 


i= 1 ) i = 1 



Be careful to not confuse "independent" and "incompatible" ! 

So far to summarize a bit we have: 

Type Expression 

2 incompatibles events (disjoints) 

P(AUB) = P(A) + P(B) 

2 not incompatibles events (joints) 

P(A U B) = P(A ) + P(B) - P(A n B) 

2 not incompatibles but independents events 

P(A n B) = P(A) ■ P(B) 


Table 4.14 - Classical cases of probabilities 

Thanks the above definition, we can show that the probability that either A or B is to take place 
(e.g. at least one of the two but not both at the same time), is simply equal to... the probability 
that the two do not does not occur at the same time: 

P(A U B) = P(A) + P(B) - P (An B )) 

= P(A) + P(B) - P(A)P(B) = 1 - P(A)P(B) (6.17) 

= 1-P(1 -A)P(1 -B) 

We can also use this definition to determine the probability that only one of two events occurs: 

P(A ®B) = P(A)P(B) + P(B)P(A) 

= P(A)( 1 - P(B)) + P(B)( 1 - P(A)) = P(A) + P(B) - 2 P(A)P(B) (6.18) 
= P(A) + P(B) - 2 P(A n B ) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


Consider that in a given area, over 50 years, the probability of a major earthquake 
is 5% and on the same period the probability a major flood is 10%.... We would like to 
know what is the probability that a nuclear power plant exactly meets one of the both 
events during the same period, assuming they can not occur at the same time. We then 
calculate the probability from the above equation and that gives 14%... 

There is a common and important area in the industry where the four following relations are 
frequently used: 

AND = P(A) ■ P(B) 

OR COMPATIBLE = P(A) + P(B) - P(A 0 B ) 

XOR = P(A) + P(B ) - 2 P(A n B) 


This is the "tree analysis error" or "probabilistic tree analysis" which is used to analyse the 
possible reasons for failure of a system of any kind (industrial, administrative or other). 

To close this part of the chapter consider the following figure displaying Venn diagrams (see 
section Set Theory) for all 16 events (including the impossible event) that can be described in 
terms of two given events A and B. In each case, the event is represented by the red area: 

Consider the situation where A represents and earthquake and B represents a major flood and U 
the universe of all dramatics events for a nuclear power plant. We consider that the two events 
are independents. Then each of the 16 events can be described as follows, either mathematically 
or verbally. 

1. An earthquake can occur or a flood or nothing or the both together or any other event (to 
resume: any event can occur). 

P(U) = 1 = 100% (6.20) 

2. A U B: Any event with an earthquake a flood or the both event together can occur. 

P(A U B) = P(A ) + P(B) - P(A n B) = P{A) + P(B) - P(A)P(B) (6.21) 

3. AU B c : Any event with earthquake can occur with or without a flood excepted events 
with a flood not associated to an earthquake. 

P(A U B c ) = P(U) - P(B) + P(A n B) = 1 - P(B) + P(A)P(B) (6.22) 

4. A c LI B: Any event with earthquake can occur with or without a flood excepted events 
with a flood not associated to an earthquake. 

P(A C U B) = P(U) - P(A ) + P(A nB) = l- P(A ) + P(A)P(B) (6.23) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 





Figure 4.49 - Possible Venn diagrams for two events 

5. A c U B c : Any event can occur excepted those associated with an earthquake together with 
a flood. 

P(A C U B c ) = P(U ) - P(A HB) = 1- P(A)P(B ) (6.24) 

6. A: Any event with an earthquake can occur (this include the events associating an earth- 
quake and a flood). 

P(A) = P(A) 


7. B: Any event with a flood can occur (this include the events associating a flood and an 

P(B) = P(B) 


8. (AnB) U ( A c nB c ): Any event can occur excepted those including an earthquake without 
a flood or those including a flood without an earthquake. 

P((A nB)u ( A c n B c )) = P(U) - P(A) - P(B ) + 2 P(A n B ) 

= 1 - P(A) - P(B) + 2 P(A)P(B) 


9. (AnB c ) U (A c UB): Any event including an earthquake without a flood or a flood without 
an earthquake can occur. 

P((A n B c ) U (A c U B)) = P(A) + P(B ) + 2 P(A D B) (6.28) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

10. B c : Any event excepted those including a flood can occur. 

P(B C ) = P(U ) - P(B) = 1 - P(B) (6.29) 

11. A c : Any event excepted those including an earthquake can occur. 

P(A C ) = P(U) - P(A ) = 1 - P(A) (6.30) 

12. AD B: Any events associating an earthquake and a flood together can occur. 

P((AnB)) = P(A)P(B) (6.31) 

13. An B c : Any event with an earthquake and without a flood can occur. 

P((AnB c )) = P(A) - P(A)P(B) (6.32) 

14. A c n B: Any event with a flood and without an earthquake can occur. 

P(A c nB) = P(B) - P(A)P(B) (6.33) 

15. A c D B c : Any event can occur excepted those including an earthquake and/or a flood. 

P(A C n B c ) = P(U) - P(A U B) = 1 - P(A) - P(B) - P(A)P(B ) (6.34) 

16. A fl A c or B D B c \ Impossible Event. 

P(AnA c ) = P(BnB c ) = P( 0 ) =0 (6.35) 

6.3 Conditional Probabilities 

What can we infer about the probability of an event B knowing that an event A has occurred, 
aware that there is a link between A and B! In other words, if there is a link between A and B, 
the completion of A has to change our understanding of B and we want to know if it is possible 
to define the conditional probability of an event (relatively) to another event. 

This type of probability is called "conditional probability" or "a posterior probability" of B 
knowing A, and is denoted in the context of the study of conditional probabilities: 

P(B/A) (6.36) 

and often in practice to avoid confusion with a possible division: 

P(B | A) (6.37) 

and we sometimes find in U.S. books the notation: 

P(B A A) (6.38) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

or also: 

We also have the case: 





which is called "likelihood function of A" or "a priori probability of A given B" . 

Historically, the first mathematician to have used the correct notion of conditional probability 
was Thomas Bayes (1702-1761). This is why we often say "Bayes" or "Bayesian" probabilities 
as soon as conditional probabilities are involved: "Bayes formula", "Bayesian statistics", etc. 

The notion of conditional probability that we will introduce is much less simple than it first 
appears and the conditionals problems are an inexhaustible source of errors of any kind (there 
are famous paradoxes on the subject and even expert requires peer review to minimize mistakes). 

Let’s start with a simple example: Suppose we have two dice. Now imagine that we only 
launched the first die. We want to know what is the probability that by throwing the second 
dice, the sum of the two numbers is equal to a given minimum value. Thus, the probability 
of obtaining the minimum value given the value of the first die is totally different from the 

probability of obtaining the same minimum value in throwing two dice at the same time. How 

to calculate this new probability? 

Let us now formalize the process! After the launch of the first dice, we have: 

A = (the result of the first throw is...} (6.41) 

Under the hypothesis that B C A , we feel that P{B / A) must be proportional to P(B), the 
proportionality constant being determined by the normalization: 

P(A/A) = 1 (6.42) 

Now let B C A c (B is included in the complement of A so that the events are mutually exclu- 
sive). It is then relatively intuitive .... that under the previous hypothesis of incompatibility we 
have the conditional probability: 

P(B/A) = 0 (6.43) 

This leads us to the following definitions of respectively a posteriori and a priori probabilities: 


Thus, the fact to know that A has occurred reduces all possible outcomes of the universe U of 
B. From there, only the events of type Af\ B are important. The probability of A given B or 
vice versa (by symmetry) must be proportional to P{A D B)\ 

The coefficient of proportionality is the denominator and it ensures the certain event. Indeed, if 
two events A and B are independent (think the black cat and the scale for example), then we 

P(B/A) = 



P(A/B) = 

P(A n B) 


P(AnB) = P(A)P(B) 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

and then we see that P(B/A ) is equal to P(B) and therefore the event A adds no information 
to B and vice versa! So in other words, if A and B are independent, we have: 

P(B/A) = P(B) and P(A/B) = P(A) (6.46) 

Another fairly intuitive way to see things is to represent the probability measure P as a measure 
of subsets areas (surface) of M 2 . 

Indeed, if A and B are two subsets of respective areas P(A) and P(B) then the question of 
what is the probability that a point in the plane belongs to B knowing that it belongs to A it is 
quite obvious that the probability is given by answer: 

Surface(A) flSurface(i?) 

Surface^) <6 ' 47) 

We would like to indicate that the definition of conditional probability is often used in the 
following way: 

P(A n B) = P(A/B)P(B ) = P(B/A)P(A ) 


call "formula of compound probabilities". Thus, the a posteriori probability of B knowing A 
can also be written as: 

P(B/A) = 




The way that tis formula gives an update of the probability hypothesis, B, in light of some body 
of data, A, is named the "diachronic interpretation". "Diachronic" meaning that something is 
hapenning over time, in this case the probability of the hypothesis changes, over time, as we see 
new data. 

In this interpretation the different terms have a name: 

• p(B) is the probability of the hypothesis before we see the data, name as we already 
know, the "prior probability", or just "prior". 

• p(B/A ) is what we want to compute, the probability of the hypothesis after we see the 
data, named as we already know, the "posterior". 

• p(A/B ) is the probability of the data under the hypothesis, named the "likelihood". 

• p(A) is the probability of the data under my hypothesis, name the "normalizing constant". 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


Suppose a disease like meningitis. The probability of having the meningitis will 
be denoted by P(M) = 0.001 (arbitrary value for the example) and a sign of this 
disease like headache will be noted P(S) = 0.1. We assume known that the a posteriori 
probability of having a headache if we have meningitis is: 

P(S/M) = 0.9 (6.50) 

The Bayes’ theorem then gives the a priori probability of having meningitis if we have a 


P(S/M)P(M ) 




We also note that: 

P(A) = P ((Si UB 2 U ... U B t ) n A) 

= P ((Si n A) u (b 2 n A) u ... u (5; n A)) 

= P (( B 1 n A) + (b 2 n A) + ... + (Bi n A)) 

= P(A/5 1 )P(5 1 ) + P(A/B 2 )P{B 2 ) + ... + P{A/B n )P{B n ) = sum n i=1 P(A/ B^P^) 


So we can know the probability of the event A knowing the elementary probabilities P(Bi ) of 
its causes and the conditional probabilities of A for each B,; 


which is called the "formula of total probabilities" or "total probabilities theorem". But also, 
for any j, we have the following corollary using the previous results that gives us following an 
event A, the probability that it is the cause Bi that produced it: 

P(Bj/A) = 

P(B s nA) _ P(A/Bj)P(Bj) 




which is the general form of the "Bayes formula" or "Bayes’ theorem" that we will us a little 
in the Statistical Mechanics chapter and through the study of the theory of queues (see section 
Quantitative Management). You should know that the implications of this theorem are, however, 
considerable in daily life, in medicine, in industry and in the field of Data Mining. 

We often find in the literature many examples of applications of the previous relation with only 
two possible outcomes B with respect to the event A. Therefore we find the Bayes formula 
written in the following form for each issue: 

pm P(A/P 1 )P(P 1 ) P(A/Bi)P(Bi) 

P(A/ Bi)P(Bi) + P(A/B 2 )P(B 2 ) P(A/B 1 )P(B 1 ) + P(A/P 1 )P(P 1 ) 

pm /M = P{A/B 2 )P{B 2 ) P(A/B 2 )P(B 2 ) 

[ 2/ P(A/ Bi)P(Bi) + P(A/ B 2 )P(B 2 ) P(A/B 2 )P(B 2 ) + P(A/B 2 )P(B 2 ) 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

and note that in this particular case (binary outcomes): 

P{Bi/A) + P(B 2 /A) = P(Bi/A) + PiBt/A) 

P{A/ Bi)P{B 1 ) P(A/B 2 )P(B 2 ) 

P{A/ Bi)P(Bi) + P(A/ B)P(B) + P(A/B 2 )P(B 2 ) + P{A/B 2 )P{B 2 ) 
P{A/ Bi)P(Bi) P(A/B 2 )P(B 2 ) 

P{A/ Bi)P(Bi) + P(A/ B)P(B) + P(A/B 1 )P(B 1 ) + P(A/B 1 )P(B 1 ) 
^ P{A/ Bi)P(Bi) + P{A/B 2 )P{B 2 ) 

P{A/ Bi)P(Bi) + P(A/ J B 1 )P(B 1 ) 

_ P(A/ Bj)P(Bi) + P(A/ Bj)P(Bi) _ 

P(A/ Bi)P(Bi) + P(A/ Bi)P(Bi) 


is an intuitive result. 

For binary events, we also have (returning to the theorem of total probabilities seen above): 


P(A) = Y.P(A/B,)P(B,) 

= P(A/B 1 )P(B 1 ) + P(A/B 2 )P(Bo ) 

= P(A/Bi)P(B 1 ) + P(A/Bi)P(Bi) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


El. A disease affects 10 people on lO’OOO (0.1% = 0.001). A test has been de- 
veloped which has a 5% false positives (people not having the disease but for which the 
test says they are affected) but still always detects the disease if a person has it. What is 
the probability that a random person for which the test gives a positive result really has 
this disease? 

There is therefore 10,000 people, 500 of which are false positives, and we know a 
posteriori that 10 people have really the disease. Then the probability that somebody 
who has a positive test result is really sick is: 

P(M ) 
P(T ) 






P(T/M)P(M ) + P(T/(M))P((M) = 0.05095 

P(T/M)P(M ) 






This is often a shocking and counter-intuitive result. It also highlights why diagnostic 
tests must be extremely reliable! 

E2. Two machines Ml and M2 produce respectively 100 and 200 pieces. Ml 
produces 5% defective pieces and M2 produces 6% (posterior probabilities). What is 
the a priori probability that a defective piece was manufactured by the machine Ml? We 
then have: 


P{A/Mj)P{Mj) _ P{A/M 1 )P(M l ) 

£ P(A/M i )P(M i ) “ P(A/M!)P(M!) + P(A/M 2 )P(M 2 ) 

5 300 


100 100 

5 100 6 200 


100 300 100 300 3 3 

E3. From a batch of 10 pieces with 30% defective, we take a sample of size 3 without 
replacement. What is the probability that the second piece is correct (whatever the first 

We have: 

P(A) = ^P(A/B i )P(B i ) = '£P(A/B i )P(B l ) 



= P(A/B l )P(B 1 ) + P(A/B 2 )P(B 2 ) = + 51 = 70% 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

E4. We conclude with an important example for companies where employees have 
more time in their career to pass exams or assessments in the form of multiple choice 
questions (MCQs). If an employee responds to a question there are two issues: either he 
knows the answer or he try to guess it. Let p be the probability that the employee know 
the answer and therefore 1 — p that he guess it. We admit that the employee who guesses 
will correctly answer with a probability of 1/m where m is the number of proposed 
answers. What is the a prori probability that an employee (really) knows the answer to a 
question with 5 choices if he answered correctly? 

Let B and A be respectively the events "the employee knows the answer" and "the em- 
ployee correctly answers the question". Then the a priori probability that an employee 
knows (really) the answer to a question that he answered correctly is: 

P(B/A ) 


P(A/ B)P(B) + P(A/B)P(B) 

1 • p 

1 • p H (1 —p) 



Bayesian analysis provides also a powerful tool to formalize reasoning under uncertainty and 
the examples we have shown above illustrate how this tool can be difficult to use. 

6.3.1 Conditional Expectation 

Now, we will see to the continuous version of the conditional probability by introducing the 
subject directly with a particular example (the general theory being indigestible) infinitely im- 
portant in the field of social statistics and quantitative finance. However, this choice (the study 
of a particular case) implies that the reader has read the first chapter of Statistics to study the 
functions of continuous distributions and especially that of the Pareto law. 

So here’s the scenario: Often, in social sciences or economics, we find in the literature dealing 
with the laws of Pareto statements like the following (but almost never with a detailed proof): 
whatever your income, the average income of those who have an income above yours is in a 
constant ratio, greater than 1, to your income if it follows a Pareto random variable. Then we 
say that the law is isomorphic to any truncated part itself. 

Let X be a random variable equal to the income and following a Pareto with the density (see 
section Statistics): 

Let’s see what it is exactly: 

/(*) = k ^h (6.62) 

with k > 1 , x rn > 0, x > x m and that has for distribution function (see also the Statistics 
chapter for the detailed proof): 

P(X < x) = 1 - (— Y (6.63) 

\ x ' 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

The sentence begins with "whatever your income then select any income xq > x rn . 

Now we need to compute "the average income of those with income higher than xq" . It is 
therefore asked to calculate the expected (average income) of a new random variable Y that is 
equal to X, but restricted to the population of people with an income above ;/; 0 : 

^ - X \{X>xo) 

The distribution function of Y is given by: 

P(Y <x) = P(X<x \X> x 0 ) 



This expression is of course equal to zero if x < x 0 . Well, so far we have only do vocabulary. 
First recall the following conditional probability relation already seen before: 


P(A n B) 


for x > xq we have for the conditional law: 

P (X < x | X > x 0 ) = 

P(X > x 0 ) 


Before going further, you should be aware that the numerator and denominator are independent 
but that the whole must be considered, however, as the realization of a single random variable 
which we denote Y. Furthermore, only the numerator is a dependent variable. The denominator 
can it be considered as a normalization constant. 

So we see that the density of Y is given by the function: 

fr(y) = 



P(X>x o) 

y <x o 
y>x o 

Now we can calculate the expectation of Y: 


E(y)= [ yf Y (y)dy — I 



;dy = 

J P(X > X 0 ) ' P(X > X 0 ) J 

X 0 XQ 

+oo +oo 

1 r r k kr k r 1 

Fpr>W) / ^ / ? dy 




P(X > x 0 ) (k - l)^-i 
Knowing that (see section Statistics): 




P(X > Xq) (. k ~ 1)Xq 

k - 1 


P(X >x0)= / k 

rpk /™ \ k 

•Pm I 


k + 1 



X 0 





info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

We finally have: 

E (V') = ^ 


E(Y) represents also the average income of those with an income above x 0 and as can be seen 
from the above equality it is in a constant ratio, greater than 1, to your income xq. 

We can check this result by doing a Monte Carlo simulation in a spreadsheet software (it is 

interesting to mention it to generalize to situations not computable by hand). You just need to 
simulate the inverse of the distribution function: 


in Microsoft Excel 11.8346: 

= ($B$7~$B$6/(1 -RANDBETWEEN (1,1 0000)/ 1 0000))"( 1 /$B $6) 

and then take the average of the values obtained above or equal to a given X (which corresponds 
to x 0 ) and ensure that we get the good results as proved above! 

Obviously, we could also calculate the conditional variance (verbatim the conditional standard 
deviation). It may come one day... 

6.3.2 Bayesian Networks 

Bayesian networks are simply a graphical representation of a problem of conditional probabil- 
ities to better visualize the interaction between the different variables when they begin to be in 
large numbers. 

This is a technique increasingly used in decision aided software (Data Mining), artificial intel- 
ligence (A.I.) and also in the analysis and risk management (ISO 31010 norm). 

Bayesian networks are by definition directed acyclic graphs (see section Graphs Theory), so that 
an event can not (even indirectly) influence its own probability, with quantitative description of 
dependencies between events. 

These graphs are used for both knowledge representation models and calculating conditional 
probabilities machines. They are mainly used for diagnosis (medical and industrial), risk anal- 
ysis (diagnostics failures, faults or accidents), spam detection (Bayesian filter), voice text and 
image opinions analysis, fraud detection or bad payers as well as data mining (M.K.M.: Mining 
and Knowledge Management) in general. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


Many systems and software based on drawings or on information in existing databases 
exists to build and analyse Bayesian networks. Paid solutions: SQL Server, Oracle, 
Hugin. Free solutions (at this date): Tanagra, Microsoft Belief Network MSBNX 1.4.2, 
RapidMiner. Personally I prefer the simplicity of the small software MSBNX from Mi- 
crosoft. For information, in 15 years of professional experience as a consultant I have 
met so far only one company on more than 800 multinationals which used Bayesian 
networks... (in transportation). 

V / 

Use a Bayesian network is assimilated to do "Bayesian inference". Based on observed informa- 
tion, we calculate the probability of possible known data but not observed. 

For a given domain (e.g. medical), we describe the causal relations between variables of interest 
by a graph (we do not need again to specify that it is acyclic). In this graph, the causal relations 
between variables are not deterministic, but probabilistic. Thus, the observation of a cause or 
multiple causes does not always implies the effect or effects that depend on it, but only changes 
the probability of observing them. 

The particular interest of Bayesian networks is to consider simultaneously a priori knowledge 
of experts (in the graph) and experience contained in the data. 

Example of 5 variables with relations (directed acyclic graph) and numbering of states/variables: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Obviously, the construction of the causal graph is based primarily on return of experience (REX) 
and sometimes results on standards or reports of expert committees. In computing science, the 
causal graph automatically change depending on the content of databases (think at the Amazon 
book store in real time target advertisements based on your past purchases or at the Genius 
Apple service). But we can rarely think to all possibilities and there will sometimes hidden 
states between two known states that have been forgotten and that would have allowed to better 
modelize the situations. 

Suppose in the example above that with the help of a corporate database, we know that in about 
100, 000 man-days, we hat in the company 1, 000 accidents (i.e.. 1% of total) and 100 machines 
failures (i.e.. 0.01%of total). Then we represent it in the traditional form as follows: 

P(A1 >=[ l%,99%] P(S2H0.1%,99.9%J 

where we have the subset 5 2, 54, 55 which is what experts name a "serial or linear connection", 
the triplet 53, 52, 54 is a called a "divergent relation" (if the arrows were reversed for the triplet, 
we would have a "converging relation"). 

Before going further with our example we will make some observations in relation to these 
three types of relations: 

For clarity, we distinguish first "conditional independence" and "conditional dependence". 

We say that events A and C are "conditionally independent" if given an event B the following 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

equality holds: 

P(A/B) = P(A/B,C) (6.72) 

So the term "conditional" implies the presence of B and the fact that C does not influence the 
probability of the event A. 

About "conditional dependence", this time we can distinguish three types of relations. 

1. The conditional dependence of the following type is called a "serial or linear connection" 
(already mentioned above): 

where A, B and C are dependent (in this particular example there are 3 dependent nodes 
A, B and C, but in general this dependence relates to all nodes if there were more than 3) 

In addition, A and C are conditionally dependent to B. But if the variable B is known, A 
no longer provides any useful information about C (the path of uncertainty is somehow 
broken) and therefore A and C become conditionally independent. We then have the 
conditional probability that simplify as follows: 

P(C/B, A) = P(C/B ) (6.73) 

2. The conditional dependence of the following type is called a "divergent connection" (as 
already mentioned above): 

Figure 4.53 - Divergent Bayesian network 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

In addition, B and C are conditionally dependent on A. But if A is known, B does not 
provide any more information on C (again the path of uncertainty is somehow broken) 
and therefore B and C become independent. We then have for example if A is known: 

P(C/A, B ) = P{C/A) (6.74) 

3. The conditional dependence of the following type is called a "convergence connection" 
or "E -Structure" (as already mentioned above): 

where this time the parents are independent. So B and C are independent but become 
conditionally dependent on A. If A is known, then we have: 

P(A/B,C) = P(A/B) (6.75) 

The dependence between parents therefore requires the observation of their common 

Now to make a concrete example, suppose our database gives us (thanks to quality managers 
who always inputs the quality issues) that when a machine failure occurred, 99 times out of 100 
(99%) there has been a total production stop (i.e. 1% of time there was no production stop) and 
on all stop production 1% was not due to a machine failure. What we traditionally represent as 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

P(.S'1)=[1%.99%] P(52)=[0.1%,99.9%] 

So the "implicit probability" that there is a production stop is given by: 

P{S4) = P(S4/S2)P(S2) + P(S4/S2)P(S2) = 99% • 0.1% + 1% ■ 99.9% = 1.098% 


This value represents the implicit proportion of productions stop from the 100,000 man-days 
(so we can give a proportion of rows in the database that represents a production stop regardless 
of the cause and even without knowing the details of the database).. 

It then follows immediately that the implicit probability that there is no production stop is given 

P(S4) = 1 - P(S4) = 98.902% (6.77) 

This is consistent with what gives us the freeware MSBNX 1.4.2: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Now suppose we observed a production stop. What is the a posteriori probability that it is due 
to a machine failure? We then have: 

P(m , c.., P(Si/S2)P(S2) 99%. 0.1% 

P(S2/S4) ^ — pm — ■ 1.098% 9 - 02% <6J8) 

We can also check this with the software MSBNX 1.4.2: 

Figure 4.57 - A Posteriori probability of a production stop due to a machine failure in MSBNX 1.4.2 

Now, imagine that our database gives us (always thanks to quality managers who ensured to 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

input quality issues) that 99 times out of 100 (99%) when there was a production stop, there 
was an evacuation. However 5% of evacuations were identified as having nothing to do with a 
production stop (i.e. 95% of evacuations are due to fire exercises OR other events): 


P(S2)=[0. 1 %.99.9%] 

Figure 4.58 - 2nd level Bayesian network 

Now to calculate the implicit probability retrospectively (a posteriori) of evacuations compared 
to machines failures, we saw that when we had a conditional dependence serie, the conditional 
probability depends only on the direct parent. Thus, we get: 

P(S 5) = P(S5/S4)P(S4) + P(S5/S4)P(S4) = 99% • 1.089% + 5% • 98.092% ^ 6.03% 


We can also check this with the software MSBNX 1.4.2: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

s>* Microsoft Be et NerwOfki Edging Mode .MSBNX.EN - [Belief Network: Model_MSBNX.ES 

*1*1*181*1 tj 





C ^PrcKi uction Stop^ ^ 








Figure 4.59 - Implicit probability of an evacuation in MSBNX 1.4.2 

So the implicit probability of evacuation does not actually depend on machine failures. 

Now suppose we have observed an evacuation. We want to know what is the a posteriori prob- 
ability that it is due to a machine failure ! Then we have: 

TOO.,0,) P(S5/S4)P(S4) 99% ■ 1.098% ^ ^ 

P(S4/S5) ^ P(S5) ^ 6.03% 18 '° 2% <6 ' 80) 

We can also check this with the software MSBNX 1.4.2: 

£4, Microsoft Be ef Networks: Eva -ating 'Mod e l_ M SBNX_ E N - [Evaluation: 1] 

Q- File View Window Help 

D|a?| Efflz | b|e|^| Jpj 


0* I 

Spreadsheet | Bar Chart | Recommendations 

0O Machi 
0 o Produ 


v v 'fe 


Node Name 

State 0 

State 1 






Machine Failure 





Production Stop 





Figure 4.60 - A Posteriori probability of an evacuation due to a machine failure in MSBNX 1.4.2 

Now we study the case with the alarm and again a database allows us to build a table with 
different probabilities: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

P(.S'l )=l 1 %.99%] 

P(52)=[0. 1 %.99.9%] 

Machine failure 


Accident du travail. Panne machine 

Yes. Yes 

Yes. No 

No, Yes 

No. No 













Production stop 


Machine failure 














Production stop 









Figure 4.61 - 2nd level Bayesian network with second branch 

Now to calculate the implicit probability that there is an alarm, we will have to consider the four 
possible situations. We then use the theorem of total probability: 

P(S 3) = P(S3/S1, S2)P(S1)P(S2) + P(S3/S1, S2)P(S1)P(S2) 

+ P(S3/S1,S2)P(S1)P(S2) + P(S3/S1, S2)P(ST)P(S2) (6 ' 81) 

What a little more rigorously should be written: 

P(S 3) = P(S3/S1 A S2)P(S1)P(S2) + P(S3/S1 A S2)P(S1)P(S2) 

+ P(S3/S1 A S2)P(S1)P(S2) + P(S3/S1 A S2)P(S1)P(S2) 

The numerical application therefore provides for the implied probability of an alarm: 

P(S3) = 75% • 1% • 0.1% + 99% • 99% • 0.1% + 10% • 1% • 99.9% 

+ 10% • 99% ■ 99.9% ^ 10.089% 



What can be built and check as follows with MSBNX 1.4.2: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Microsoft Be e< Networks Editing 'Ntedt .MSBNX.EN • '.6< c( Network; MoOel.MS6MX.EN] 

M f' 

Window Help 

Dltfl Eg|z|*fgfel tl 

c WorkAccident 

(Work Accident) 




C ^ProductbnStopJ ^ 
(Alarm) *'■«*** ( (Production Stop) 

ijsesjrr-eft (Modek Model.MS8NX.EN, Node Aiami 

Perem llodere) 



















Figure 4.62 - Implicit probability in MSBNX 1.4.2 

It may be useful to the reader to know that he can sometimes found in the literature the following 

P(53/51 = Yes, 52 = Yes) = P(53/51, 5 2) = P(53/51 A 52) 



In the particular example studied above all event have only two states. But in practice 
they can have 3, 4 and more states. Therefore probabilities cross-tables quickly become 

As in the previous case, suppose we know that there was a working accident. We wish then 
calculate the a priori probability of an alarm. We then have (observe that the probability depends 
actually only to the state 5 2 state since the state 51 is completely known!): 

P(53/51) = P(53/51,52)P(52) + P(53/51,52)P(52) 

= 75% • 0.1% + 10% ■ 99.9% = 10.065% (6 ' 85) 

We can also check this with the software MSBNX 1.4.2: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Microsoft Bel ef Networks: Evaluating 'Model_MSBNX_EN - [Evaluation: 9] 

Q • File 

View Window Help 


Hf 1 X % E %| f 1 

El Model_MSBNX_EN Spreadsheet | BarChart | Recommendations | 

0 o Alarm 

Qo Evacuation 

0 O MachineFailure 
Qo Productions top 

WorkAccident = Yes 

Figure 4.63 - Implicit probability of an alarm in MSBNX 1.4.2 

Node Name 

State 0 State 1 


Yes No 

0.1007 0.8994 

So, knowing that there was a work accident increases the probability that there is an alarm (we 
start from a probability of 10.089% to go to a probability of 10.65%). 

To complete this example, we would calculate the a posteriori probabilities P(S2/S3 ) and 
P(S1/S3). To do this, we must first calculate the a priori probabilities P(S3/S2 ) and 
P(S3/S1) (this last one has been calculated just before). 

We have for the missing value (which can be easily checked as before with MSNBX 1.4.2 

P(S3/S2) = P(S3/S2, Sl)P(Sl) + P(S3/S2, Sl)P(Sl) 
= 75% • 1% + 99% • 99% ^ 98.76% 

We then have: 

P(S3/S1) = 10.065% 

P(S3/S2) = 98.76% 



We now have everything we need to calculate the a priori probability of P(S2/S3 ) and 

( ' b P(S3) 

p(si/S3) = rwjypjsi) 

98.76% ■ 0.1% 
10.065% • 1% 

= 0.9789% 
= 0.9976% 


So the a priori probability that there is a machine breakdown when we know that there is an 
alarm is 0.979% (i.e. 0.021% that the trigger of the alarm is not a priori due to a machine 
failure). Respectively there is, a priori, 0.998% probability that there is a work accident when 
we know there is an alarm (and then 0.002% that it is not a priori due to a work accident). 

From the critical point of view, when there is finally an alarm we can not say a lot of things.... 
This is because, in this case, to the fact that the events of significant interest both have low 
probability to occur (work accident and machine failure) and that the employees respond quite 
well at the start of the alarm (otherwise if the a priori probabilities were high it would mean 
that the behavior of the employees is not good because we can guess - with exasperation - in 
advance which problem occurs with a good confidence). 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

To conclude, the reader will have noticed that the calculations can quickly become annoying 
when the graph becomes complex and this explains the use of computer software. Furthermore, 
in the banking sector that uses for example Bayesian networks for credit risk, the a priori prob- 
ability can be more complex. For example we might want to know the a priori probability that 
there is a machine failure knowing that we have an alarm and an accident: 

P(S2/S3, SI) = P(S2/S3 = Yes, SI = Yes) (6.89) 

6.4 Martingales 

A martingale in probabilities (there is another one in stochastic processes) is a technique to 
increase the chances of winning in gambling while respecting the rules of the game. The princi- 
ple is completely dependent on the type of game we are focusing, but this term is accompanied 
by an aura of mystery that some players would know efficient secret techniques to cheat with 
chance. For example, many players (or candidates to play) search THE martingale that will beat 
the bank in the most common games in casinos (institutions whose profitability relies almost 
entirely on the difference - however small - between the chances of winning and losing). 

Many martingales are the dream of their author, some are actually inapplicable, some could 
actually give the possibility to cheat a little. Gambling in general are unfair: whatever the shot 
played, the probability of winning of the casino (or of the State in the case of a lottery) is 
greater than this of the player. In this type of game, it is not possible to inverse the chances, just 
to minimize the probability of gambler’s ruin. 

The most common example is the roulette wheels martingale. It consists to play a single chance 
to the roulette wheels (red or black, odd or even) to win, for example, a unit in a series of moves 
by doubling his bet if we lose, and that until we earn. Example: the player bets 1 unit on red, 
if red comes out, it stops playing and won 1 unit (2 units less gain setting unit), if black comes 
out, he doubles his betting by 2 units on red and so on until he wins. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Figure 4.64 - Casino roulette wheel 

Having a chance on two to win, he may think he will eventually win, and when he wins, he is 
necessarily paid for everything he has played more one unit of his initial bet. 

This martingale appears to be safe in practice. Note that in theory, to be sure of winning, we 
should have the opportunity to play an unlimited number of times.... This has major drawbacks: 

This martingale is in fact limited by the bets that the player can do because you have to double 
the bet every time you lose: 2 times the initial bet, then 4, 8, 16 .... if he loses 10 times, he must 
be able to bet 1024 times its initial investment for the 1 1th party! Therefore a lot of money for 
little gain! 

The roulette wheels also have a "0" which is neither red nor black. The risk of losing at every 
shot is is then larger than 1/2... 

In addition, to paralyze this strategy, casinos offer table games per set: from 1 to 100.-, from 2 
to 200.-, from 5 to 500.-, ... Therefore it is impossible to use this method on a large number of 
shots, which increases the risk of losing it all. 

Blackjack is a game that has winning strategies: several playing techniques, which usually 
require to memorize the cards, can overturn the chances in favour of the player. The mathe- 
matician Edward Thorp has published in 1962 a book that was at the time a real best-seller. But 
all these methods require long training weeks and are easily detectable by the croupier (sudden 
changes in the amounts of bets are typical). The casino has then the opportunity to banish from 
its establishment the players using this playing martingale. 

It should be noted that there are enough advanced methods. One of them is based on the less 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

played combinations. In games where the gain depends on the number of winning players 
(Lotto...), playing the least played combinations maximize gains. This is how some people sell 
combinations that would be statistically very rarely used by other players. 

Based on this reasoning, we can still conclude that a player who would have been able to 
determine statistically the least played combinations, to maximize its expected payoff, will 
in fact certainly not be the only player to have achieved this by the analysis of these famous 
combinations! This means that in theory the numbers the least played are actually overplayed 
combinations, the best might be to achieve a mix of played numbers and overplayed numbers 
to play for the ideal combinations Another conclusion to all this is maybe that the best is to 
play random combinations which ultimately are less likely to be chosen by the players who 
incorporate a human and harmonious factor in the choice of their numbers. 

6.5 Combinatorial Analysis 

"Combinatorial analysis" (counting techniques) is the field of mathematics that deals with the 
study of all the issues, events or facts (distinguishable or indistinguishable) with their arrange- 
ments (combinations) ordered or not according to some given constraints. 

Definitions (#73): 

Dl. A sequence of objects (events, issues, objects, ...) is said "ordered" if each suite with a 
particular order of objects is recognized as a particular configuration. 

D2. A sequel is "unordered" if and only if we are interested in the frequency of appearance of 
objects regardless of their order. 

D3. The objects (of a sequence) are said "distincts" if their characteristics can not to be con- 
fused with the other objects. 


We chose to put combinatorial analysis in this chapter because when we calculate prob- 
abilities, we also often need to know what is the probability of finding a combination or 
arrangement of given events under certain constraints. 

V 1 / 

Students often have difficulty remembering the difference between a permutation, an arrange- 
ment and a combination. Here is a little summary of what we’ll see: 

• Permutation: We take all the objects. 

• Arrangement: We choose objects from the original set and the order intervenes. 

• Combination: Same as for the arrangement, but the order does not interfere. 

You must not forget that for each result, the reverse will give the probability of falling respec- 
tively on a given permutation/arrangement/combination! 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

We will present and demonstrate below the 6 most common cases from which we can find 
(usually) all others: 

6.5.1 Simple Arrangements with Repetition 

Definition (#74): A "simple arrangement with repetition" is an ordered sequence of length m of 
n distinct objects not necessarily all different in the sequence (either: with possible repetitions!). 

Let A and B be two finite sets of respective cardinal m, n such that there is trivially m ways to 
choose an object in A (of type a ) and n ways to choose an object in B (of type b). 

We saw in the section Set Theory that if A and B are disjoint, that: 

Card(A U B) = Card(A) + Card(-B) = m + n (6.90) 

We therefore deduce the following properties: 

PI. If an object can not be at the same time of type a and type b and if there is m ways to 
select an object of type a and n ways to choose an object of type b, then the union of 
objects gives m + n selections (this is typically the result of the SQL UNION queries 
without filters in corporate Relational Databases Management System). 

P2. If we can choose an object type of type a in m ways then an object of type b in n ways, 
then there is according to the Cartesian product of two sets (see section Set Theory) 

Card(A x B) = Card(A) • Card(-B) = m ■ n (6.91) 

ways to choose a single object of type a then an object of type b. 

With the same notation for m and n, we can choose for each element of A, its single image 
among the n elements of B. So there are n ways to choose the image of the first element of A, 
then also n ways to choose the image of the second element of A, ..., and n ways to choose the 
image of the m-th element of A . The total number of consecutive possible applications from A 
to B is thus equal to the m product of n (thus m times the cartesian product of the cardinality 
of the set B with itself!). It is usual to write it under the following way (we have indicated the 
different ways to write his result as it can be found in various textbooks): 

Card (B a ) = Card [ B x B x ... x B = Card(5) m = A™ = n r ' 

m times 


where B A is the set of applications from A to B. The increase in the number of possibilities is 
geometric (not "exponential" as it is often wrongly said!). 

This result is mathematically similar to the ordered result (an arrangement where the order of 
elements in the sequence is taken into account) of m trials in a bag containing n different balls 
with replacement after each trial. In France this result is traditionally named a "p-list". 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


El. How many (ordered) "words" of 7 letters can we form from a separate alpha- 
bet of 24 letters (very useful to know the number of trials to find a password for 
example)? The solution is: 

AI a = 24 7 = 4, 586, 471, 424 (6.93) 

E2. How many groups of people will we have in a referendum on 5 subjects and where 
each can be either accepted or rejected? The solution is (widely used in some Swiss 

Al = 2 5 = 32 (6.94) 

A simple generalization of this result can consist of the following problem statement: 

If we have m such objects ki, h 2 , ..., k m as k r may take n, different values then the number of 
possible combinations is: 

And if rii = n ,2 = ... = n m we have equation then we fall back on: 

A-mn Tl\Tl2---Tli...Tlp l Tl 



6.5.2 Simple Permutations without Repetitions 

Definition (#75): A "simple permutation without repetition" (formerly called "substitution") of 
n distinct objects is an ordered (different) sequence of these n objects all different by definition 
in the sequence (without repetition). 


Be careful not to confuse the concept of permutation (n elements between them) and this 
of arrangement (of n elements among m)\ 

\ I / 

The number of permutations of n items can be calculated by induction: there are n places for 
a first element, n — 1 for a second element, ..., and there will be only one place for the last 
remaining element. 

It is therefore trivial that we the number of permutations is given by: 

n(n — l)(n — 2 ){n — 3)...(n — (n — 1)) (6.97) 

Recall that the product: 


n(n — 1 )(n — 2 )(n — 3 )...(n — (n — T)) = JJ i (6.98) 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

is called de "factorial of n" and we note it n\ for n G N. 
There is therefore for n distinguishable elements: 

A n = \[i = n\ 


as possible permutations. This type of calculation can be useful for example in project manage- 
ment (calculation of the number of different ways to get in a production line n different parts 
ordered from external suppliers). 


How many (ordered) "words" of 7 different letters without repetition can we cre- 

A n = 7! = 5040 (6.100) 

This result leads us to assimilate it to the ordered results (an arrangement A n equation in 
which the order of elements in the sequence is taken into account) of the trial of balls that 
are all different from a bag containing n distinguishable balls without replacement. 

6.5.3 Simple Permutations with Repetitions 

Definition (#76): A "simple permutation with repetition" is when we consider the number of 
ordered permutations (different) of a sequence of n distinct objects not necessarily all different 
in a given quantity. 

When some elements are not all distinguishable in a sequence of objects (they are repeated in 
the sequence), then the number of permutations that we can be do are then trivially reduced to 
a smaller number then if all the elements were all distinguishable. 

Consider ra* as the number of objects of the type i, with: 

n\ + n 2 + ... + n k = n 


then we write: 

A n (ni, ...,n k ) (6.102) 

the number of possible permutations (yet unknown) with repetition (one or more elements in a 
sequence of repetitive elements are not distinguishable by permutation). 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

If each of the n t positions occupied by identical elements were occupied by different elements, 
the number of permutations could then have to be multiplied by each of the n t ! (previous case). 

A n (ni,n 2 , ...,n k )n 1 \n 2 \...n k \ = n\ 

then we deduce: 

A n (n 1 ,n 2 ,-.;n k ) 

n 1 \n 2 \...n k \ 

If the n objects are all different in the sequence, we then have: 

ri \ ! = n 2 \ = ... = n k \ = 1! = 1 




and we fall back again on a simple permutation (without repetition) as: 

A n (m,n 2 , ...,n k ) = 



ni\n 2 \...n k \ l!l!...l! 

= ni 



How many (ordered) "words" can we create with the letters of the word "Missis- 

11 ! 

A ‘A’AAA> = l!2!4!4! =34 ’ 650 (6 '‘° 7) 


This result leads us to assimilate it to an ordered result (a permutation A n where the order of 
elements in the sequence is not taken into account) of the trial of balls that are not all different 
from a bag containing k > n balls with limited replacement for each ball. 

6.5.4 Simple Arrangements with Repetitions 

Definition (#77): A "simple arrangement without repetition" is an ordered sequence of p objects 
all distinct taken from n distinct objects with n > p. 

We now propose to enumerate the possible arrangements of n objects among p without repeti- 
tion. We denote A £ the number of these arrangements. 

It is easy to calculate that A' n = n and to check that = n(n — 1). Indeed, there are n ways 
to choose the first object and {n — 1) ways to choose the second when we already have the first. 

To determine a nice expression for , we reason by induction. We assume equation known 
and we deduce that: 

K = A V [n-(p~ 1)] = Ai~\ n -p + 1) (6.108) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

It comes: 

A p n = n(n — 1 )(n — 2 )(n — 3 )...(n — (p — 1)) (6.109) 


n! = A^(n — p)\ = [n{n — l)(n — 2)...(n — p + 2)(n — p + 1)] (n — p)(n — p — 1)... 




This result leads us to assimilate it to the ordered results (an arrangement A p in which the order 
of elements in the sequence is taken into account) of the trial of p distinct balls from a bag 
containing n different balls without replacement. 


Consider the 24 letters of the alphabet, how many (ordered) "words" of 7 distinct 
letters can we create? 

24 ! 

A 7 24 = (24 _' , = 1,744,364,160 (6.112) 

The reader may have noticed that if p = n we end up with: 

77 f 

A p n = - = n\ (6.113) 

So we can say that a simple permutation of n elements without repetition is like a simple ar- 
rangement without repetition when n = p. 

6.5.5 Simple Combinations without Repetitions 

Definition (#78): A "simple combination without repetitions" or "choice function" is an non- 
ordered sequence (where the order doesn’t interest us!) of p elements all different (not necessar- 
ily in the visual sense of the word!) selected from n distinct objects and is by definition denoted 
Cp in this book and named the "binomial" or "binomial coefficient". 

If we permute the elements of each simple arrangement of elements p of n, we get all simple 
permutations and we know that there are in a number of p\, using the notation convention of 
this book we then have (contrary to that recommended one by ISO 31-11 !): 


K _ n\ 
p\ p\(n — p)\ 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

It is a relation often used in gambling but also in the industry trough the hypergeometric dis- 
tribution (see section Quantitative Management as well as quite high level statistics like order 
statistics (see chapter Statistics). 

A simple way to remember this function is the following trick: Consider we must select p 
among n independently of the ordery what are the number of possibilities? 

We know that we have 6-5-4 = 120 to select them all taking into account the order! The 
calculation we just made is obviously equal to n\/p\ = 6!/3! = 6-5-4. But as the order must 
not be taken into account we must divide the 120 by the number of ways we can arrange the 3 
people in the group. So we divide 120 by 3! or more generally and logically by (n — p)\. Hence 
the relation above! 

This result leads us to assimilate it to the unordered result (an arrangement C% in which the 
order of elements of the sequence is not taken into account) of the trial of p balls of a bag 
containing n different balls without replacement. 


El. Consider the 24 letters of the alphabet, how many choices do we have to take 
7 letters in the 24 without taking into account the trial order? 



7!(24 — 7)! 

346, 104 


The same value can be obtained with the function COMB IN ( ) of Microsoft Excel 
1 1.8346 (English version). 

E2. In a Design of Experiment (see section Industrial Engineering) we have 2 factors 
of L = 3 levels each and therefore we need N = 9 runs to completely determine all 
the interactions. If we consider that we can take a subset of S' = 3 runs, how many 
combinations of 3 among the 9 can we choose if repititions are vorbidden? 


c 9 = 

3 3! (9 — 3)! 

= 84 


We understand therefore why in Design of Experiments it is important to found a trick to 
choose the best subset (D-optimum designs) 

There is, in relation to the binomial coefficient, another relation very often used in many case 
studies and also more globally in physics or functional analysis. This is the "Pascal’s Formula": 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Proof 4.27.2. 

[n — 1)! 


(n — 1)! 

s~rn— 1 s~m— 1 

P_1 p “ ((n-l)~(p-l))!(p-l)! ' (n-l-p)!p! 

(n — 1)! (n - 1)! 


(n — p)!(p — 1)! (n — 1 — p)!p! 
We also have p\ = p(p — 1)!, then: 

(P- 1)! = - 


and because (n — p) (n — p — 1)! = [n — p)\: 


(n — p — 1)! = 

(n — p)! 
n — p 



C 'n—l I s~rn—l 

p - 1 Tk — 

(n — 1)! (n — 1)! (n — l)!p (n — l)!(n — p) 

(n — p)!(p — 1)! (?z — 1 — p) !p! (n — p)!p! (n — p)\p\ 

[p + (ra - p)} = 

{n — 1)! (n — l)\n n\ 

(■ n — p)!p! 


[n — p)!p! (n — p)!p! p 


□ Q.E.D. 

6.5.6 Simple Combinations with Repetitions 

Definition (#79): A "simple combination with repetition" of p elements of n is a collection of 
p non-ordered elements, not necessarily distinct. 

Simple combinations with repetition are very important for the Wald-Wolfowitz statistical test 
used in economics and biology and that we will study in the Statistics section. 

We will Introduce this kind of combination directly with an example an ingenious approach that 
we have thanks to the physicist and 1938 Nobel Prize in Physics: Enrico Fermi. 

Consider {a, b , c, d, e, /} a set having a number n of elements equal to 6 and where we draw a 
number of elements p equal to 8. We would like to calculate the number of combinations with 
repetition of elements in a starting set of cardinal 6 in a destination set of cardinal 8. 

Consider, for example, the following three combinations: 

aabbbeef (6.121) 

bbdddeee (6.122) 

bbbddddd (6.123) 

where as the order of elements does not occur, we have grouped the elements to facilitate the 
reading. Now represent all the above elements by the same symbol "0" and separate the groups 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

consisting of a single element by bars (this is the trick Enrico Fermi). Thus, when one or more 
elements are not included in a combination, we still denote the separation bars (corresponding 
to the number of missing elements + the separation of group). Thus, the three combinations 
above can be written as: 

00 I 000 III 00 I 0 (6.124) 

| 0 0 || 000 | 000 | (6.125) 

| 0 0 0 0 II 0000 II (6.126) 

We see above that in each case, there are eight "0" (logic. ..) but also that there are also always 
five "|". The number of combinations with repetitions of six elements of a starting set to the 
final one of 8 elements is equal to the number of permutations with repetitions of 8 + 5 = 13 
elements, so: 




We also see that in the general case the number of combinations without consideration of repe- 
titions order can also be written: 

(n + p — 1)! 

(n — l)!p! 

It is traditional to write this: 

(n + p- 1)! 
p (n-l)!p! 

We also see that: 

r n = (n + p- 1)! = (n + p- 1)! 
p (n — l)!p! ((n + p — 1) — p)\p\ 

(jn-\-p— 1 


pn = C n+p - 1 = (n + p- 1)! 

p p (n-l)\p\ 

That we also sometimes write: 





^ n+p - 1 

L 'n- 1 

(n + p — 1)! 

(n — 1 )!p! 






To resume: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Type Expression 

Simple arrangement with repetitions 

A™ = n m 

Simple arrangement without repetitions 

An _ ^ 

m ( n-m)\ 

Simple permutation without repetitions 

A n = nl 

Simple permutation with repetitions 


A n (n 1 ,n 2 , ...,n k ) = , , . 

ri\ m-d...! 

Simple combination without repetitions (case 
of the simple arrangement without repetitions 
where the order is not taken into account) 

C n M K n! 

\P J ml 7771(77 — 777)! 

Simple combination with repetitions (case of 
the simple permutation with repetition where 
the order is not taken into account) 

r n C n+v- 1 0 n+p-l)\ 

p p (n -l)\p\ 

Table 4.15 - Resume of main Combinatorial Analysis cases 

'You're better N 
in theory than 
In practice , 

6.6 Markov Chains 

Markov chains are simple but powerful probabilistic and statistical tools but for which the 
choice of the mathematical presentation can sometimes be a nightmare... We will try here to 
simplify a maximum the writings to introduce this great tool widely used in businesses to man- 
age supply chain, in queuing theore for call centers or cash desk, in failure theory for preventive 
maintenance, statistical physics and biological engineering and also in time series analysis and 
forecasting (and the list goes on and for more details the reader should refer to the relevant 
chapters available in this book...). 

Definitions (#80): 

Dl. We note by {X(f)} tgT a probabilistic process function of time whose value at any time 
depends on the outcome of a random experiment. Thus, at each time t. X(t) is a random 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

variable that we name "stochastic process" (for more details on financial applications see 
the chapter Economy). 

D2. If we consider a discrete time, we then note "discrete time stochastic process" as 


D3. If we further assume that the random variables X n can take only a discrete set of values 
we then speak of "process in discrete time and discrete space". 


It is quite possible as in the study of communications flows (see section Quantitative 
Management) of having a process in continuous time and discrete state space. 

V / 

Definition (#81): (2f n } ngN is a "Markov chain" if and only if: 

P(X n j | X n — 1 in— 2) X n — 2 in— 2i •••■> -^-0 ^o) P[X n j \ P n 1 in— l) (6.133) 

in other words (it is very easy!) the probability that the chain is in a certain state on the n-th step 
of the process depends only on the state of the process at step n — 1 and not on any previous 


Also in probabilities a stochastic process verifies the Markov property if and only if the 
conditional probability distribution of future states, given the present moment, depends 
only on the present state and not even past states as the relation above. A process with 
this property is also called a "Markov process". 

V / 

Definition (#82): A "homogeneous Markov chain" is a chain such that the probability that it has 
to go in a certain state at the n-th stage is independent of time. In other words, the probability 
distribution characterizing the next step does not depend on time (of the previous step), at all 
times the probability distribution of the chain is always the same for characterizing the transition 
to the current step. 

We can then define (reduce) the law of "probability transition" of a state i to state j by: 

Pij = P(X n = j | X n _i = i) (6.134) 

It is then natural to define the "transition matrix" or "stochastic matrix": 


P 12 • 

■ Pin 


P22 ■ 

■ P2n 


Pm2 ■ 



as the matrix that contains all possible transition probabilities between states in an oriented 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Markov chains can be represented graphically as an oriented graph G (see section Graph The- 
ory) sometimes named "automate" having for the top points (states) i and for the edges the 
oriented couples We then associate to each component an oriented arc and a transition 


The reader has seen in the previous example that we have the trivial property (by construction!) 
that the sum of the terms (probabilities) of a row of the matrix P is always unitary (and therefore 
the sum of the terms of a column of the transpose of the matrix unit is still equal to the unit too): 

EPv = 1 (6.137) 


and that the matrix is positive (meaning that all its terms are non-negative). 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


Remember that the sum of the probabilities of the columns is always equal to 1 for the 
transpose of the stochastic matrix! 

V J 

The analysis of transient state (or: random walk) of a Markov chain consist to determine (or to 
impose!) to the column-matrix (vector) p(n) to be in a given state at n-th step of the walk: 


with the sum of the components that is always equal 1 (since the sum of the probabilities of 
being in any of the vertices of the graph at given a time/step must be equal to 100%). 

We frequently name this column matrix "stochastic vector" or "probability measure on the ver- 

Theorem 4.28. We want to prove that the total probability of this stochastic vector is always 
equal to 1. 

Proof 4.28.1. If p(n) is a stochastic vector, then its image: 

p(n + 1) = P T p(n ) (6.139) 

is also a stochastic vector. Effectively, pi(n + 1) > 0 because: 

Pi(n+ 1) = J2PnPj( n ) (6.140) 


is a sum of positive or zero values. Furthermore, we find: 

Y J Pi{n + l) = EEmW = Y,Y,PhpM) = Y, ( Hph ) Pj( n ) 

i 3 3 * j \ i / (6.141) 

= E 1 -pA n ) = Y,pM) = 1 

3 3 

□ Q.E.D. 

This probability vector whose components are positive or zero, depends (it’s pretty intuitive) on 
the transition matrix P and the vector of initial probabilities p(0). 

Although it is provable (Perron-Frobenius theorem), the reader may verify by a practical case 
(computerized or not!) that if we choose any vector state p(n) then there exists for any stochastic 
matrix P a unique probability vector traditionally noted 7 r as: 

P T 7T = 77 (6.142) 

Such a probability measure tt satisfying the above equation is called an "invariant measure" or 
"stationary measure" or "balance measure" which represents the equilibrium state of the system. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

In terms of linear algebra (see section of the same name) for the eigenvalue 1, is an eigenvector 
of P (see section Linear Algebra). 

We will see a trivial example in the Graph Theory section which will be redeveloped in detailed 
as in the section of Game and Decision Theory in the context of pharmaco-economics and in 
the section of Software Engineering when we will study the fundamentals of the Google Page 
Rank algorithm. But also note that the Markov chains are used for example in meteorology (or 
in the case of computer passwords hacks): 

or in medicine, finance, transportation, marketing, etc. 

In the field of language analysis, from the frequency analysis of a sequence of words, computers 
are able to also build Markov chains and therefore propose a more correct semantic during 
grammatical computerized corrections or in written transcription of oral presentations. 

Definitions (#83): 

Dl. A Markov chain is said to be an "irreducible Markov chain" if all states are bound to 
others (it’s the case of the example in the figure above). 

D2. A Markov chain is said to be an "absorbing Markov chain" if one of the states of the chain 
absorbs the transitions (so nothing comes out just to say things in a more simple way!). 

27 votes, 51.11% 


info @ sciences, ch 


S Tatistics is a science that concerns the systematic grouping of facts or recurring events 
that lend themselves to a numerical or qualitative assessment over time according to a 
given law. In the industry and the economy in general, statistics is a science that helps 
in an uncertain environment to make valid inferences. 

You should know that among all areas of mathematics, the one that is used the most widely in 
business and research centres is Statistics and especially since softwares greatly facilitates the 
calculations! This is why this chapter is one of the biggest in this book even if only the basic 
concepts are presented! 

Note also that Statistics have a very bad reputation at the university because the notations are 
often confusing and vary greatly from one teacher to another, from one book to another, from 
one practitioner to another. Strictly speaking, it should comply with the vocabulary and notation 
of the ISO 3534-1:2006 norm and unfortunately this chapter was written before the publication 
of this standard ... a certain period of adaptation will be necessary to obtain the full compliance. 

It is perhaps useless to precise that Statistic is widely used in engineering, theoretical physics, 
fundamental physics, econometrics, project management and in the industry of process, in the 
fields of life and non-life insurance, in the actuarial science or in the database analysis (with 
Microsoft Excel very often ... unfortunately ....) and the list goes on. We will also meet quite 
often the tools presented here in the chapters of Fluid Mechanics, Thermodynamics, Technical 
Management, Industrial Engineering and Economy (especially in the last two). The reader can 
then refer to them to have some concrete practical applications of the most important theoretical 
elements that will be seen here. 

Note also that in addition to a few simple examples on these pages, many other application 
examples are given on the exercise server of the companion website in the categories Probability 
and Statistics, Industrial Engineering, Econometrics and Management Techniques. 

Definition (#84): The main purpose of Statistics is to determine the characteristics of a given 
population from the study of a part of the population, called "sample" or "representative sam- 
ple". The determination of these characteristics should enable statistics to be a tool for the 
decision help! 


The data processing concerns the "descriptive statistics". The interpretation of data from 
estimators is called "statistical inference" (or "inferential statistics"), and mass data anal- 
ysis "statistical frequency" as opposed to Bayesian inference (see section Probabilities). 

V 1 / 

When we observe an event taking into account some given factors, there can happen that a sec- 
ond observation takes place in conditions that seem identical. By repeating these steps several 

4. Arithmetic 

EAME v3. 5-2013 

times on different supposedly similar objects, we find that the observed results are statistically 
distributed around a mean value that is ultimately the most likely possible outcome. In prac- 
tice, however, we sometimes perform a single measurement and then the goal is determine the 
value of the error we make by adopting it as average measure. This determination requires 
knowledge of the type of statistical distribution we are dealing with and that is on what we will 
focus (among others) to study here (at least the basics!). However, there are several common 
methodological approaches whe we face the hazard (less common are not mentioned yet): 

1. A first is to simply ignore the random elements, for the simple reason that we do not know 
how to integrate them. We then use the "scenarios method" also called "deterministic 
simulation". This is typically the tool used by financial managers and non-graduates 
managers with tools like Microsoft Excel (which includes a scenarios management tool) 
or MS Project (which includes a tool to manage the deterministic optimistic, pessimistic 
and expected scenarios). 

2. A second possible approach, when we do not know how to associate probabilities to 
specific future random events, is game theory (see section Game and Decision Theory) 
where semi-empirical criteria of selection are used as the criterion of maximax, minimax, 
Laplace, Savage, etc. 

3. Finally, when we can link probabilities to random events, whether these probabilities 
derived from calculations or measurements, whether they are based on experience from 
previous similar situations as the current situation, we can use descriptive and inferential 
statistics (contents of this chapter) to obtain usable and relevant information from this 
mass of acquired datas. 

4. A last approach when we know the relative probabilities from intervening events in re- 
sponse to strategic choices is the use of decision theory (see section Game and Decision 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


Rl. Without mathematical statistics, a calculation on datas (e.g. an average), is a 
"punctual indicator". This is mathematical statistics which gives it the status of estimator 
whose bias, uncertainty and other statistical characteristics are controlled. We generally 
seek to ensure that the estimator is unbiased, convergent and efficient (we will see during 
our study of estimators further what is exactly all that stuff). 

R2. When we communicate a statistic it should be an obligation to specify the 
confidence interval, the p- value and the size of the studied sample (absolute statistics) 
and its detailed characteristics and make available the sources and data protocol 
otherwise it has almost no scientific value (we will see all these concepts in detail further 
below). A common mistake is to communicate in relative values. For example, on a test 
group of 1,000 women, 5 women will die from breast cancer without screening check, 
with screening check 4 women will die. Some will say a little to quickly (typically 
physicians....) that screening checks saves 20% of women (relative value as one the 
of five could have been saved...). In fact this is wrong since the absolute benefit of 
screening is insignificant! 

R3. If you have a teacher or trainer who dares to teach you statistics and proba- 
bilities only with examples based on gambling (cards, dice, matches, coins, etc.) dispose 
or denounce him. Normally examples should be based on the industry, economy or 
R&D, i.e. in areas used in daily by businesses!). 

7.1 Samples 

During the statistical analysis of sets of information, the way to select the sample is as important 
as how to analyze it. The sample must be representative of the population (we do not necessarily 
make reference to human populations!). For this, the random sampling is the best way to achieve 

Definitions (#85): 

Dl. The statistician always starts from the observation of a finite number of elements, which 
we name the "population". The observed elements, in quantity n, are all of the same 
nature, but this nature can be very different from one population to another. 

D2. We are in the presence of a "quantitative character" when each observed element is ex- 
plicitly subject to the same measure. To a given quantitative character, we associate a 
"quantitative variable" continuous or discrete, which summarizes all the possible val- 
ues that the measure can take (such information being represented by functions like the 
Gauss-Laplace distribution, the beta distribution, the Poisson distribution, etc.). 


We will come back on the concept of "variable" and "distribution" a little further... 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

D3. We are in the presence of a "qualitative character" when each observed element is ex- 
plicitly subject to a single connection to a "modality" from a set of exclusive modalities 
(e.g.: man | woman) that permits to classify all studied elements in a given certain point 
of view (such information being represented by bar charts, sector charts, bubble charts, 
etc.). All modalities of a character can be established a priori before the survey (a list, 
a nomenclature, a code) or after the survey. A study population can be represented by a 
mixed character, or set of modalities such as gender, wage range, age, number of children, 
marital status for example for an individual. 

D4. A "random sample" is by default (without more precision) a sample in which all indi- 
viduals in a population have the same chance, or "equally likely probability" (and we 
emphasize that this probability must be equal), to end up in the sample. 

D5. In the opposite in a sample whose elements were not chosen randomly, then we talk about 
a "biased sample" (in the opposite case we talk about a "non-biased sample"). 


A small representative sample is by far preferable to a large biased sample. But 
when the sample sizes are small, the hazard can a result worst than the biased 


7.2 Averages 

The concept of "average" or "central tendency" (financial analysts call it a "measure of loca- 
tion"...) is with the notion of "variable" at the basis of statistics. 

This notion seems very familiar to us and we talk a lot about it without asking too many ques- 
tions. But there are various qualifiers (we emphasize that this are only qualifiers!) to distinguish 
the way of the resolution of a problem of calculating the average. 

Thus, you must be very very careful about the calculations of averages because there is a ten- 
dency in business to rush and to systematically use the arithmetic mean without thinking, which 
can lead to serious errors ! A nice example (for an analogy) is that a considerable number of laws 
require only moderate levels of pollution per year, while for example, smoking one cigarette per 
day during 365 days does not have the same impact as smoking 365 cigarettes in one day dur- 
ing one year when both have the same average taken over a year ... This is a clear evidence of 
statistical incompetence of the legislature. 

Here is a small sample of common mistakes: 

• Consider that the arithmetic mean is the value that divides the population into two equal 
parts (although it is the median that does this). 

• Consider that the average of the ratios of the type goals/realisations is equal to the ratio of 
the average of the goals and of the average of the realisations (although it is not the same 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

• Consider that the average salaries of different subsidiaries, is equal to the global average 
(while this is true if and only if there is the same number of employees in each subsidiary 
of the company). 

• Consider that the average of the average of the rows in a table is always equal to the 
average of the columns of the same table (although this is true if and only if the cell 
contents are not empty). 

• Calculate the arithmetic average growth of the revenue in % (as the geometric mean must 
be used). 

• etc. 

We will see below different average with examples relative to arithmetic, to the enumeration, 
to physics, to econometrics, to geometry and sociology. The reader will find other practical 
examples by browsing the entire book. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Definitions (#86): As given Xi real numbers, then we have: 

Dl. The "arithmetic average" or "sample average" (the most commonly known) is defined as 
the quotient of the sum of n observed x t values by the total size n of the sample: 

1 n 

Ha ^ x i 


and is very often written x or fi and is for any discrete or continuous statistical distribution 
an unbiased estimator of the mean. 

The arithmetic average represents a statistical measure expressing the magnitude that 
would have each member of a set of measures if the total must be equal to the product of 
the arithmetic average by the number of members. 

If some values repeats more than once in the measurements, the arithmetic mean is then 
often formally noted as following: 

H a 

1 n 

- V n l x l 
n T^i 





^2 w i x i 



and is named "weighted average". Finally, we could indicate that under this approach, the 
actual weighted average will be named "mathematical mean" or just "mean" in the field 
of study of probabilities. 

We may as well use the frequencies of occurrence of the observed values named "classes 

fi = ~ (7.3) 


So that we get another equivalent definition named the "weighted average by the classes 

1 n n n 

Ha = -Y' riiXi = V — Xi = V fiXi (7.4) 

n r— ( n 

*=i i=i i=i 

Before continuing, it’s important to know that in the field of statistics it is useful and often 
necessary to combine measurements/data in class intervals of a given width (see examples 
below). We often have to make several tries to choose the intervals even if there are semi- 
empirical formulas for choosing the number of classes when we have n available values. 
One of these semi-empirical rules used by many practitioners is to retain the smallest 
integer k of classes such as: 

2 k > n 

V Z 


the width of the class interval is then obtained by dividing the range (difference between 
the maximum and minimum measured value) by k. By convention and rigorously... (so 
rarely respected in the notations), a class interval is closed on the left and open on the 
right (see section Functional Analysis): 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

This empirical rule is called the "Sturges rule" and is based on the following reasoning: 

We assume that the values of the binomial coefficient C k gives the number of individuals 
in an ideal histogram (we let the reader check this simply with a spreadsheet software 
l ik e Microsoft Excel 11.8346 and the C0MBIN(k,i) function) of k intervals for the i-th 
interval. As as k becomes large the histogram looks more and more like a continuous 
curve called the "Normal curve" or "bell curve" as as we will see later. 

Therefore, based on the binomial theorem (see section Calculus), we have: 


= E c i = E cy~ k t> k = E c i k r- k i k = J2c i k = (a + b) k = {i + i) fc = 2 k 


*= o 



i = 0 

Then, for each interval i the practitioner will traditionally take the average between the 
lower and upper limit for the calculation and multiply it by the corresponding class fre- 
quency fi. Therefore, the grouping of class frequencies implies that: 

(a) The weighted average by the frequencies differs from the arithmetic average. 

(b) As the approximation seen above it will be a worst indicator compare to the arith- 
metic average... 

(c) It is very sensitive to the choice of the number of classes, than very bad at this level. 

There are many other empirical rules for the discretization of random variables. For 
example, the software XLStat offers not less than 10 rules (constant amplitude, Fisher 
algorithm, A'-mcans, 20/80, etc.). 

Later, we will see two very important properties of the arithmetic average and of the mean 
that you will have to understand absolutely (the weighted average of deviations from the 
average and the average deviations from the average). 


The "mode", noted Mod or simply M 0 , is defined as the value that appears most 
often in a set of data. In Microsoft Excel 11.8346, it is important to know that the 
MODE ( ) function returns the first value in the order of values having the largest 
number of occurrences therefore assuming a unimodal distribution. 

\ y 

D2. The "median" or "middle value", noted M e is the value that cut the population values 
into two equal parts. In the case of a continuous statistical distribution fix) of a random 
variable X, it is the value that represents the value that has 50% of cumulative probability 
to occur (we will see further in details the concept of statistical distribution): 

P{X < M e ) = P(X > M e ) = 

M e +00 

/ f{x) Ax = / f(x)dx = 0.5 

M e 


In the case of a series of ordered values xi, x 2 , ..., ...x n , the median is therefore by 

definition the value such that we have the same number of values that are greater than or 
equal to it than the number of values that are less or equal to it. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


Rl. The median is mainly used for skewed distributions, because it represent them 
better than the arithmetic average. 

R2. The median is in practice often not a single value (at least in the case 

Tl Tl 

where n is even). Indeed, between the values corresponding to ranges — and — + 1 
there is an infinite number of values to choose which cut the population in half. 

V J 

More rigorously: 

• If the number of terms is odd, i.e. of the form 2 n + 1, the median of the series is the 
term of order n + 1 (that the terms are all distinct or not!). 

• If the number of terms is even, i.e. of the form 2 n, the median of the series is half- 
sum (arithmetic average) of the values of the terms of rank n and n + 1 (that the 
terms are all distinct or not!). 

In any case, by this definition, it follows that there are at least 50% of the terms of the 
serie that are smaller than or equal to the median, and at least 50% of the terms of the 
serie that are greater than or equal to the median. 

For example, consider the table of wages below: 

Employee N° Wage Cumulated Employees % Cumulated Employees 





































































Table 4.16 - Identification of the median 

There is in the table an odd number 2 n + 1 of values. So the median of the series is the 
term of rank n + 1. This is 1,600.— (result that give any spreadsheet software). The 
arithmetic average is in this case about 2, 020. — . 

In direct relation with the median it is important to define the following concept to under- 
stand the underlying mechanism: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Definition (#87): Be given a statistical series xi,x 2 , ..., Xi , ...,x n , we name "dispersion 
of absolute differences" around x the number e'(x) defined by: 

e'(x) = E I Xi ~ x I (7.9) 

e'(x) is minimum for a value of x closest to a given value Xi in the sense of the absolute 
error value. The median is the value that achieves this minimum (extremum)! The idea 
will then be to study the variations of the function to find the position of this extremum. 

Indeed, we can write: 

Vx G [x r , x r+ i] , r G {1, 2, 3, ..., n — 1} e\x) = E 





i=r+ 1 




Then by definition of the x value: 

r n 

e'(x) = E I Xi - X I + E 
i = 1 i=r+l 



= [rx - (xi + x 2 + ... + x r ) - [(x r+ i + ... + x n - (n - r)x] 
= (2 r - n)x + (x r+ i + ... + x n ) - Lpi + x 2 + ... + x r ) 


What allows us to skip the absolute values is simply the choice of the index r that is taken 
so that the serie of values in practice can always be split into two parts: all that is less than 
the element indexed by r and all that is superior to it (i.e.. the median by anticipation...). 

e'{x) is also a piecewise (discrete) affine function (similar to the equation of a line for 
fixed fixed values of r and n) where we see that by analogy the factor: 

2 r — n 


is the slope of the function and: 

(x r+ i + ... + X n ) - (xi + x 2 + ... + X r ) 


the F-intercept (ordinates at the origin). 


The function is decreasing (negative slope) until r is less than — and increasing when 


r is greater than — (it passes trough an extremum!). Specifically, we distinguish two 
particularly cases of interest since n is an integer: 

• If n is even, we can say that n = 2 n' , then the slope can be written 2(r — n') and 
it is equal to zero if r — n and then, as the result is valid by construction only 
for Vx G [x r , x r+ i] then e'(x) is constant on [x n >, x n /+i] and we have an extremum 
necessarily in the middle of this range (arithmetic average of the two terms). 

• If n is odd, we can say that n = 2n' + 1 (we cut the series into two equal parts), 

then the slope can be written (2r — 2 n' — 1) and it is zero if r = n' + -, as the result 

is only valid for Vx G [x r ,x r+ i] then it is immediate that the middle value is the 
median x n /+i. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

We find out the median in both cases. We will also see later how the median is defined 
for a continuous random variable (the underlying idea is the same). 

There is another practical case where the statistician has at its disposal only the values 
grouped in intervals of statistical classes. The procedure for determining the median is 
then different: 

When we have at our disposal only a values grouped in intervals of statistical classes, the 
abscissa of the point of the median is generally within a class. To then get a more accurate 
value of the median, we perform a linear interpolation. This is what we name the "linear 
interpolation method of the median". 

The median value can be read from the graph or calculated analytically. Indeed, consider 
the graph of the cumulative probability F(x) in class intervals as below where the bounds 
of the intervals were connected by straight lines: 






Figure 4.67 - Graphical representation of the estimation of the median by linear interpolation 

The value of the median M e is obviously located at the crossroads between the cumulated 
probability of 50% (0.5) and the abscissa. Thus, by applying the basics of functional 
analysis, we have (just by observing that the slope in the interval containing the median 
is equal in the half-interval to the left and to right adjacent to the median): 

A x M e - 2 4-2 


Ay 0.5 -0.2 0.7 -0.2 

What we frequently write: 

M e — a b — a 


0.5 - F{a) F(b ) - F{a) 

Thus the value of the median: 


Consider the following table that we will see again much later in this chapter: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


of tickets 


of tickets 

Cumulated number 

of tickets 

Relative frequencies 
of tickets 









[100, 1 50[ 
























[400 and + 




Table 4.17 - Identification of the median and the mode 

We see that the "median class" is in the range [150, 200] because the cumulative value 
of 0.5 is there (column at the right of the table) but the median has, using the previously 
established relation, the precise value of (it is trivial in the particular example of the table 
above, but we still do the calculation...): 

M e = 150 + (200 - 150) 0,5 ~ 0-3805 = 200 (7.17) 

v ' 0.5 — 0.3805 

and of course we can do the same with any other percentile! 

We can also give a definition to determine the modal value if we are only in possession 
of the frequencies of class intervals. To see that we start with diagram below named 
"grouped distribution" in frequencies bar: 

Figure 4.68 - Graphical representation of the estimation of the modal value with classes intervals 

Using Thales relations (see section Euclidean Geometry), it comes immediately, noting 
M the modal value: 

M — Xi x i+ \ — M 

Af A 2 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

As in a proportion, we do not change the value of the ratio by adding the numerators and 
adding the denominators, we get: 

M - Xj _ x i+ i - M _ x i+ i - Xj 

Ai A2 Ai + A2 

We then have: 

M = Xi + 1 ( x i+ i - Xi) (7.20) 

L\\ + A 2 

With the previous example this gives then: 


15Q (1,915 - 1,498) 

(1, 915 - 1, 498) + (1, 915 - 1, 498) 

150 + ^(250 - 150) = 150 + ^100 = 200 

(250 - 150) 


The question that then arises is to the appropriateness of the choice of the mean, mode 
or median in terms of communication ... (normally we communicate them all three in 
corporate reports!). 

A good example is that of the labor market where in general, while the average wage and 
the median wage are quite different, the institutions of state statistics calculate the median 
than many traditional media then explicitly equate to he concept of "arithmetic average" 
in their news... 


To avoid getting an arithmetic average having little sense, we often calculate a 
"trimmed average", i.e. an arithmetic average calculated after removing outliers in 
the series (using Grubbs or Dixon Tests). 

v I ! J 

The "quantile" generalize the notion median by cutting the distribution in sets of equal 
parts (of the same cardinality we might say ...) or in other words in regular intervals. 
We define the "quartiles," the "decile" and "percentile" on the population, ordered in 
ascending order, that we divide by 4, 10 or 100 parts of the same size. 

So we talk about the 90th percentile to indicate the value separating the first 90% of the 
population and the 10% remaining. 

Note that in Microsoft Excel 11.8346 the functions QUARTILE( ), PERCENTILE ( ), 
MEDIAN ( ), PERCENTRANK( ) are available and it can be useful that we specify that 
there are several variants of calculating these percentiles that explains possible variation 
between the results of different spreadsheet softwares. 

This concept is very important in the context of confidence intervals that we will see much 
further in this section and very useful in the field of quality with the use of "box plots" 
(also named "Box & Whiskers plots") to compare ("discriminate" as experts say) quickly 
two populations of data or more and especially to eliminate outliers (taking as reference 
the median will just make more sense!). 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Figure 4.69 - Box & Whiskers Plot painfully made with Microsoft Excel 11.8346 

Or more explicitly as it should be in any good statiscal software: 

Extreme Outlier 

Analysis of T rain Arrival Delay 3 rd Quartiie 

Light Outlier 

1 st Quartiie 

Figure 4.70 - Box & Whiskers Plot ideal chart type 

Another very important mental representation of box plots is the following (it can get an 
idea of the asymmetry of the distribution as is able to do the R software): 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


Qi M Qs 

Mode Mode 

Q\ M 03 Ql M Q3 

Figure 4.71 - Graphical representation of the mode, median and 1st + 3rd quartile compared to a distribution 

The concepts of median, outliers and confidence intervals that have yet been proved 
and/or just defined are so significant that there exists international standards for their 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

proper use. First let us cite the norm ISO 16269-7:2001 Median - Estimation and confi- 
dence intervals and also the norm ISO 16269-4:2010 Detection and treatment of outliers. 

D3. By analogy with the median, we define the "medial" as the value (in ascending order of 
values) that shares the (cumulative) sum values into two equal masses (i.e. the total sum 
divided by two). 

In the case of our wages example, while the median gives the 50% of the salaries being 
below and above the medial gives how many employees share (and therefore the sharing 
wage) the first half and how many employees share the second half of the total of the 
wages costs. 

Employee N° Wage Cumulated Wages % Cumulated Wages 





































































Table 4.18 - Identification of the mediale 

The sum of all wages is equal to 34, 340 and therefore the medial is 17, 170 then the 
medial is between the employees No. 11 and 12, while the median was 1, 600. We see 
then that the medial corresponds to 50% of the aggregate. This is a very useful indicator 
in Pareto or Lorenz analysis (see section Quantitative Management). 

D4. The "root mean square" sometimes denoted simply Q which comes from the general 


but where we take rri — 2. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


Consider a square of side a, and another square of side b. The average area 
of two squares equals one square of side: 

2 a 2 + b 2 
= 2 ^ ^ q 


In Microsoft Excel 1 1.8346 you can combine the functions SUMSQ ( ) , COUNT ( ) and to 
quickly calculate the root mean square as following: 

= (SUMSQ (. . .) /COUNT (. . .)) ~ (1/COUNT ( . . .)) 

D5. The "harmonic mean" sometimes simply denoted H is defined by: 





It is little known but is often the result of simple and relevant arguments (typically the 
equivalent resistance of an electrical circuit with several resistors in parallel). There is a 
HARMEAN ( ) function in Microsoft Excel 1 1.8346 to calculate it. 


Consider a distance d travelled in one direction at the speed v\ and in the 
other direction (or not) at the speed v 2 . The arithmetic average speed will be 
obtained by dividing the total distance 2d by the time of the travel: 

v = 




If we calculate the time it takes when travel d with a speed Vi that is simply the 

U = 



The total time is therefore: 

2d d d 

t = — = t\ + f 2 = 1 

v t’i v 2 


If the distance is not the same for the both velocities anyway each velocity remains 
the same this is why d disappears! 

In other words: We use the harmonic mean when are given to us ratios. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

D6. The "geometric mean" sometimes simply denoted G is defined in the general case by: 


This average is often forgotten by undergraduate employees but famous it is famous in the 
field of finance (see section Economy) this is also why there is an GEOMEAN ( ) function 
in Microsoft Excel 11.8346 to calculate it. 

A geometric mean is often used when comparing different items - finding a single "fig- 
ure of merit" for these items - when each item has multiple properties that have different 
numeric ranges. For example, the geometric mean can give a meaningful "average" to 
compare two companies which are each rated at 0 to 5 for their environmental sustain- 
ability, and are rated at 0 to 100 for their financial viability. If an arithmetic mean were 
used instead of a geometric mean, the financial viability is given more weight because 
its numeric range is larger so a small percentage change in the financial rating (e.g. go- 
ing from 80 to 90) makes a much larger difference in the arithmetic mean than a large 
percentage change in environmental sustainability (e.g. going from 2 to 5). The use of a 
geometric mean "normalizes" the ranges being averaged, so that no range dominates the 
weighting, and a given percentage change in any of the properties has the same effect on 
the geometric mean. So, a 20% change in environmental sustainability from 4 to 4.8 has 
the same effect on the geometric mean as a 20% change in financial viability from 60 to 


In 2010, the geometric mean was introduced to compute the Human Develop- 
ment Index by United Nations Development Programme. Poor performance in 
any dimension is directly reflected in the geometric mean. That is to say, a low 
achievement in one dimension is not anymore linearly compensated for by high 
achievement in another dimension. The geometric mean reduces the level of sub- 
stitutability between dimensions and at the same time ensures that a 1% decline in 
index of, say, life expectancy has the same impact on the HDI as a 1% decline in 
education or income index. Thus, as a basis for comparisons of achievements, this 
method is also more respectful of the intrinsic differences across the dimensions 
than a simple average. 

V I / 

Like for the number 0, it is impossible to calculate the geometric mean with negative 
numbers. However, there are several work-arounds for this problem, all of which require 
that the negative values be converted or transformed to a meaningful positive equivalent 
value. Most often this problem arises when it is desired to calculate the geometric mean of 
a percent change in a population or a financial return, which includes negative numbers. 

For example, to calculate the geometric mean of the values +12%, —8%, 0% and +2%, in- 
stead calculate the geometric mean of their decimal multiplier equivalents of 1.12, 0.92, 1 
and 1.02, to compute a geometric mean of 1.0125. Subtracting 1 from this value gives the 
geometric mean of +1.25% as a net rate of growth (or in financial circles is named the 
"Compound Annual Growth Rate C.A.G.R."). 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


Suppose that a bank offers an investment opportunity and plans for the first 
year an interest (this is absurd, but this is an example) with a rate (X — Y)% but 
for the second year with an interest rate (A" + Y)%. At the same time another bank 
provides a constant interest rate for two years: X%. We will say a little bit to fast 
that this is the same... In fact the two investments do not have the same profitability! 

In the first bank, a capital Co will give after the first year of interest: 

(X - Y)% ■ Co (7.29) 

and the second year: 

(X + Y)% [(X - Y)% • Co] 

In the other bank we will have after one year: 

X% • C 0 

and after the second year: 

X%(X% ■ Co) 




and so on... 

As you can probably see it the placement will not be identical if Y 0! X% is the 
not the arithmetic average of (X — Y)% and (X + Y)%. 

Now if we write: 

r, = (X + Y)% and r 2 = (X - Y)% (7.33) 

What is in reality the average value of the global interest rate r? 

After 2 years (for example), the capital is multiplied by r\ ■ r 2 . If an average exists 
it will be denoted by r and the capital will thus be multiplied by r 2 . Then we have 
the relation: 

r 2 = r\Ti -O' r — \fr\r-i (7.34) 

This is an example where we therefore see the geometric mean. Forgetting to use 
of the geometric mean a common mistake in companies where some employees 
calculate the arithmetic average rate of increase of a reference value. 

D7. The "moving average" of order n is defined as: 

MM n 

Xi + X 2 + ... + x n 


The moving average is used particularly in economics, where it represents a trend of a 
series of values, where the number of points is equal to the total number of points of the 
serie of values less the number that you specify for the period. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

A moving average in finance is calculated from the average of a stock price over a given 
period: each point of a moving average of 100 sessions is the average of 100 last current 
values. This curve, displayed simultaneously with the evolution of the curve of the values, 
smooths the daily changes in the value and gives the possibility to better see the trends. 

The moving averages can be calculated for different time periods, which can generate 
short-term trends MMC (20 sessions according to the habits of the domain), medium 
(50-100 sessions) or long-term MML (over 100 sessions): 

Figure 4.72 - Graphical representation of a few moving averages 

The crosses of the moving averages with the price curve (cutted with a certain granularity) 
of the value generate purchase or sale (basic) signals depending on the case: 

• Buy signal: when the price curve crosses the MM up. 

• Sell signal: when the price curve crosses the MM down. 

In addition to the moving average, note that there are a lot of other artificial indicators 
often used in finance (the R software has a package dedicated only to such indicators). 
As for example the "upside/downside ratio". 

The idea is the following: If you have a financial product (see section Economy) whose 
current price is P c for which you have a goal of high gain corresponding to a high price, 
which we will denote by P h (high price) and conversely, the potential loss that you feel is 
at a price Pi (low price). 

P h ~ Pc 
Pc- Pi 



info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


El. For example, a financial product of 10.— with a low price of 5.— and a 
high price of 15.— has a ratio of U D /,> = 1 and therefore an identical speculative 
factor to allow a gain or loss of 5. — . 

E2. A financial product of 10.— with a low price of 5.— and a high price of 20.— 
has a ratio of UD/. ; = 2 and therefore twice the speculative potential gain compared 
to the loss. 


Some financial institutions recommend to refuse equation below 3. Investors also 
tend to reject too high equation that can be a sign of artificial inflation. 

\ / 

D8. The "weighted average" (the moving average and arithmetic average are just a special 
cases of the weighted average with vj { = 1) is defined by: 


Is used for example in geometry to locate the centroid of a polygon, in physics to de- 
termine the center of gravity or in statistics to calculate the mean and other advanced 
regression techniques and in project management for estimating task durations forecast. 

In the general case the weights Wi represents the weighted influence or arbitrary/empirical 
influence of the element Xi relatively to the others one. 

D9. The "functional mean" or "integral average" is defined as: 


b — a 



where /if depends of a function / of a real integrated variable (see Differential and Inte- 
gral Calculus) on a range [a, b\. It is often used in signal theory (electronics, electrotech- 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

7.2.1 Laplace Smoothing 

To come back to our class frequencies seen above and before proceeding with the study of some 
mathematical properties of the averages... you should know that when we work with discrete 
probabilities distributions it happens very (very) often that we meet a typical problem whose 
source is the size of the population. 

Consider as an example the case where we have 12 documents and that we would like estimate 
the probability of occurrence of the word "Viagra". We have on a sample the following values: 

Document ID 

Word occurences 

























Table 4.19 - Class frequencies of the word 

Table that we can represent in another way: 

Word occurences 
























Table 4.20 - Respective frequencies classes of documents 

And here we have a common phenomenon. There is no record with 5 occurrences of the word 
of interest. The idea (very common in the field of Data Mining) is then to add artificially and 
empirically using a count using a technique called "Laplace smoothing" which involves adding 
k units at each occurrence. Therefore the table becomes: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Word occurences 
























Table 4.21 - Frequencies classes of documents with smoothing 

Obviously this type of technique is debatable and beyond the scientific framework ... We even 
hesitated to introduce this technique in the chapter of Numerical Methods (with the rest of all 
the empirical numerical techniques) rather than here... 

7.2.2 Means and Averages properties 

Now we will see some relevant properties that connect some of these means and averages or are 
specific to a particular mean/average. 

The first properties are important so beware to understand them: 

PI. The calculation of the arithmetic, root mean square and harmonic average/mean can be 
generalized using the following expression: 

/' \l~2^ x 


/ , i 

i = 1 


where we see: 

(a) For m = 1, we get the arithmetic average 

(b) For m — 2, we get the root mean square 

(c) For m = —1 we get the harmonic mean 

P2. The arithmetic average has the property of linearity, that is to say (without proof because 
it is simple to check): 

Xx + /j, = Xx + [i 


This is the statistical version of the property of the mean in the field of probabilities that 
we will see further. 

P3. The weighted sum of the deviations from the arithmetic average is zero. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Proof 4.28.2. First, by definition, we know that: 

r i r 

n = '^ j n i and ji — — njXj (7.41) 

i= 1 n t= 1 

then we have: 

r r r 


/ 1 r \ 


r r 

> ]ni(xi»fi)=y ] riiXi—fi y ] rti 

=Y. n iXi- 

n=y ]mXi-y ]n,iXi=0 

2—1 2—1 2—1 


\ n i = i ) 

2=1 2=1 



Thus, this tool can not be used as a measure of dispersion! 

By extension, the arithmetic average of the weighted deviations from the average is also 
equal to zero: 

J2 n i(xi- /i) 

1=1 = 0 (7.43) 


□ Q.E.D. 

This result is quite important because it will further be useful for a better understanding 
of the concept of standard deviation and variance. 

P4. Now we would like to prove that: 

11 h < fig < Ha < fig (7.44) 


The comparisons between the above means/averages and the median or the 
weighted or moving averages does not make sense this is why we won’t compare 

V / 

Proof 4.28.3. First, we consider two nonzero real numbers x\ and x 2 as x 2 > X\ > 0 and 
then we write: 

(a) The arithmetic average: 

(b) The geometric mean: 

Xi + x 2 

f^a X 

l-lg = \fXlX 2 

(c) The harmonic mean: 

11 X 2 X\ 

1 — - — | — 

1 _ Xi X 2 _ X±X 2 X 2 Xi _ + X 2 ^ t _ 2XlX 2 

Hh 2 2 2x\X 2 X\ + x 2 




info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

(d) The root mean square: 

rp 2 I 

2 _ 

AU ~~ o 

We will start to prove that y g > /j, h by contradiction by putting y g — y h < 0: 

l^g H'h \J X\X 2 

2xyx 2 _ ^x l x 2 x l + y/xix 2 x 2 - 2x x x 2 

x± + x 2 Xi + x 2 

sfxyx~ 2 x i + y/x \x 2 x 2 - 2 x±x 2 < 0 
\Jx\x 2 x\ + yjx \x 2 x 2 < 2 xia :2 
yjx \X 2 y/x rx 2 

< 0 


< 2 

X 2 



+ ,/- < 2 

x 2 V X 1 

By convenience we will now put: 

y = 

and we know that y > 1. We therefore have: 

— b \ — — - + y 

x 2 V Xi y 

and remember we search if it is possible that: 

We can now easily check this statement from the following equivalences: 

+ \ — - + y — 

x 2 V Xi y 

y 2 + 1 






< 2 y 2 + 1 — 2y < 0 (y - l) 2 < 0 (7.53) 

There is also a contradiction, and this validates our initial hypothesis: 

Ah? ~ > 0 <^> n 9 > fj, h (7.54) 

Let see if fi g > fi a . 

Under the hypothesis x 2 > x\ > 0. We search now to prove that: 

Xl + ^2 

> \JX\X 2 (7.55) 

Now we have the following equivalences: 

X\ ~\~ X 2 / \ O 9 9 / \9 

> ^X\X 2 \X\ + X 2 ) > ^X\X 2 4=^ X\ + x 2 — 2X\X 2 > 0 [Xi — x 2 ) > 0 


and the last expression is obviously correct because the square of a (real) number is always 
positive which verifies our initial hypothesis: 

AU /T; ^ AU > jdg 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

We will prove now that ji q > mu a by contradiction by putting fi q — ji a < 0: 

l^q l~^a 

Ixj + xl Xi+X 2 



%1 + xl 

Xi + x 2 



xj + xl 

^ Xi + x 2 

< 0 


„2 1 '2 


x{ + X2 x{ + 2x1X2 + X2 
^ 2 < 4 

2 2 X? + 2 XiX 2 + Xn 

=► A + xj < ^ 

x\ — 2 xix 2 + x 2 < 0 
=> (xi - x 2 ) 2 < 0 

But the square of a (real) number is always positive which verifies our initial hypothesis: 

Hq /l a > 0 <=> Hq > fl a (7.59) 

We then have: 


/ ^ h l-lg l^a ^ l^q 

□ Q.E.D. 

Once these inequalities proved, we can then move on to a figure that we attribute to 
Archimedes to place three of these averages. The interest of this example is to show that 
there are some remarkable relations between statistics and geometry (coincidence??). 

Figure 4.73 - Starting point for the geometric representation for the various averages 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

We will first write a = AB , b = BC and O is the midpoint of AC. Thus, the circle is 
drawn with center O and radius O A. D is the intersection of the perpendicular segment 
AC through B and of the circle Q (we can choose the intersection we want). H is itself 
the orthogonal projection of B on OD. 

Archimedes says that O A is the arithmetic average of a and b and that BD is the geometric 
mean of a and b, and DH is the harmonic mean of a and b. 

We then prove that (could be trivial): 

r AC a + b 

Therefore O A is the arithmetic average fi a of a and b. 
We have in the right-angled triangle ADB: 

AD 2 = DB 2 + BA 2 



Then we have in the right-angled triangle BDC: 

DC 2 = BC 2 + L)B 2 (7.63) 

We then add these two relations, and we get: 

2 DB 2 + DA 2 + BC 2 = AD 2 + DC 2 (7.64) 

We know that D is on a circle of diameter AC, so ADC is rectangle on D. Therefore: 

AD 2 + DC 2 = AC 2 (7.65) 

And then we replace BA and BC by a and b: 

2DB 2 + a 2 + b 2 = AC 2 = (a + b ) 2 2DB 2 = 2 ab & (7.66) 

So finally: 

DB = Vab 


And therefore, DB is the geometric mean fi g of a and b. We have now prove that DH is 
the harmonic mean of a and b. We have in a first time using the orthogonal projection as 
study in the section of Vector Calculus: 

DdoDi = \\DO\\ \\DB\\ cos (a) = DO ■ TJH = °^~ L)H (7.68) 
Then we also have (also orthogonal projection) 

U6o D$ = ( \\DO\\ cos (a)) \\DB\\ = DB ■ TAB = TAB 2 (7.69) 

Therefore we have: 


a + b 


and since DB = Vab, we have then: 


2 ab 
a + b 

DH is therefore the harmonic mean of a and b. Archimedes was not wrong! 




info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

7.3 Type of variables 

In talking about variables quantitative or qualitative variables, sometimes you hear variables 
being described as categorical (or sometimes nominal), or ordinal, or interval. Below we will 
define these terms and explain why they are important. 

Definitions (#88): 

Dl. The "discrete variables" (by counting) that belongs to Z: Are analyzed with statistical 
laws based on a countable definition domain always strictly positive (the Poisson or Hy- 
pergeoemetric distribution are such typical case in the industry). Are almost always rep- 
resented graphically by histograms. 

D2. The "continuous variables" (by measure) that belong to M: Are analyzed with statisti- 
cal laws based on an uncountable domain of definition strictly positive or may take any 
positive or negative value (typically the Normal distribution in the industry). Are almost 
always represented graphically by histograms with class intervals. 

D3. The "attribute variables" (by classification): They are not digital data (only when they 
are coded with digits!) but qualitative data type Yes, No, Passed, Failed, On time, Late, 
red, green blue, black, etc. The binary data type attribute follow a Bernoulli while higher 
order qualitative variables have no average or standard deviation (effectively... try to 
calculate the mean and standard deviation between the qualitative variables Red, Green 
and Pink...). 

In attribute variable we mainly distinct two subtypes of variables: 

(a) A "categorical variable" (sometimes called a nominal variable) is one that has two 
or more categories, but there is no intrinsic ordering to the categories. For example, 
gender is a categorical variable having two categories (male and female) and there 
is no intrinsic ordering to the categories. Hair color is also a categorical variable 
having a number of categories (blonde, brown, brunette, red, etc.) and again, there 
is no agreed way to order these from highest to lowest. A purely categorical variable 
is one that simply allows you to assign categories but you cannot clearly order the 
variables. If the variable has a clear ordering, then that variable would be an ordinal 
variable, as described below. 

(b) An "ordinal variable" is similar to a categorical variable. The difference between 
the two is that there is a clear ordering of the variables. For example, suppose you 
have a variable, economic status, with three categories (low, medium and high). In 
addition to being able to classify people into these three categories, you can order 
the categories as low, medium and high. Now consider a variable like educational 
experience (with values such as elementary school graduate, high school graduate, 
some college and college graduate). These also can be ordered as elementary school, 
high school, some college, and college graduate. 

Understanding the different types of data is an important discipline for the engineers because it 
has important implications for the type of analysis tools and techniques that will be used. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

A common question regarding the collection of data is what is the amount that should be col- 
lected. In fact it depends on the desired level of accuracy. We will see much further in this 
section (with proof!) how to mathematically determine the amount of data to collect. 

Now that we are relatively familiar with the concept of average (mean), we can discuss on more 
formal calculations and that will make sense. 

7.3.1 Discrete Variables and Moments 

Consider X is an independent variable (an individual of a sample, whose property is inde- 
pendent of other individuals) that can take discrete random values (realizations of the vector 
(X l , X 2 , . . . , X n ) ) with respective probabilities (pi , p 2 , . . . , p n ) where, by the axioms of probabil- 
ities (see section Probabilities): 

fte [0,1] Ep* = 1 ( 7 - 72 ) 

Definitions (#89): 

D1 . Let X be a numeric (quantitative) random variable (r.v.). It will be fully described in prac- 
tice most of time by the value of the probability (for discrete variables) for a realization 
of this variable or by the cumulative probability (for discrete AND continuous variables) 
to be typically less than or equal X for all realizations x. This cumulative (cumulative) is 
denoted by: 

F(x) = P(x < X)i\/x G R (7.73) 


1 > P(X) >0 and 1 > F(X) > 0 (7.74) 

where F(x) is named the "repartition function" of the random variable X. It is the theo- 
retical proportion of the population whose value is less than or equal to x. It follows for 

P(X > x) = 1 — F(x) <=> P(X < x) + P(X > x) = 1 (7.75) 

More generally, for any two numbers a and b with a < b, we have: 

P(a < x < b) = F(b) — F(a ) (7.76) 

D2. The "empirical repartition function" is naturally defined by (we have indicated the differ- 
ent notations that you can found in the literature): 

r yX) / j i-xi<x / j bxi< x 


i= 1 


i= 1 



associated with the sample of independent and identically distributed variables which as 
we know is named a "random vector" denoted by (x 1 , x 2 , ..., x n ). 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

It is simply the cumulative frequencies of appearance normalized to unity below a certain 
fixed value (approach that the majority of human beings are naturally using when seeking 
the repartition function). 

So if we take again the example of wages already used above, then we have for example 
for x fixed to 1,800: 

Ordered Wages Xi < x 

Frequencies l Xi < x 



































Table 4.22 - Example of the empirical repartition function 

And then: 

1 17 1 

F 17 (x < 1, 800) = — £ l Xi < x = — 10 ^ 59% (7.78) 

1 ' i=i 1 ' 

The repartition function is clearly a monotonically increasing function (or more precisely 
"non-decreasing") whose values range from 0 to 1. 

7.3. 1.1 Mean and Deviation of Discrete Random Variables 

Definition (#90): We define the "expectation" or "mean", also named "moment of order 
1", of the random variable X by the relation (with various notations): 

fix = E(X) = Epf) = J2Pi X i 


also sometimes named "parts rule". 

In other words, we know that for every event in the sample space is associated with a probability 
that we also associate with a value (given by the random variable). The question then is know 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

what value we can get at long term? The expected value (the mean...) is then the weighted 
average, by the probability, of all values of the events of sample space. 

If the probability is given by a discrete distribution function f(xi) (see the definitions of distri- 
bution functions later below in the text) of the random variable, we then have: 

Pi = f(xi) => Px = E(X) = J2 x if(xi) (7.80) 


Rl. The mean p x can also be written just simply // if there are no possible confusion on 
the random variable. 

R2. If we consider each realization of the random variables (x\, x 2 , ..., x n ) as the 
components of a vector x and each associated probability (or ponderation) (pi, p 2 , ..., p n ) 
as the components of a vector p we can write the mean in a technical way using the 
scalar product (Vector Calculus) often written: 


pox = (p,x) = J2PiXi = EpQ (7.81) 

i = 1 

V / 

Here are the most important mathematical properties of the mean for any random variable 
(whatever the distribution law!) and that we will often use throughout this section (and many 
other involving statistics): 

PI. Multiplication by a constant (homogeneous): 

E(aX) = J2aXiP(X = Xj) = a'^XjP{X = X{) = aE(aX) (7.82) 

P2. Sum of two random variables (independant or not!): 

E(X + y) = E [(®i + Vi)P ((* = Xi) n (Y = Vi))} 


= E = X.) n (Y = ft))] + E [yX({x = X.) n (V- = ft))] 


= E 


Y J P{(X=x i )n(Y = y l )) 

+ E» 

5)pp = x,)n(l' = # )) 

= Tx,P 

<X = x,) D\J(Y = y.) 

+ E vX 

(Y = ft) n UW = x i) 

= E X t P(X = x.) + E VjP(Y = ft) = E(X) + E(F) 

Where we used in the 4th line, the property view in the section of Probabilities: 

p f U = E p ( A >) 

Vie N / ieN 




info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

We deduce that for n random variables X t following any probability distribution: 

E(X0 = EETO (7.85) 


P3. Then mean of a constant a is equal to the constant itself: 

E(q) = £ api = a ^2 pi = a ■ 1 = a (7.86) 

i i 

P4. Mean of a product of two random variables: 

E(X • Y) = £ x iVi P{X = Xi , Y = Vj ) (7.87) 

And if the two random variables are independent, then the probability is equal to the joint 
probability (see section Probabilities). Therefore we have: 

E(x • Y) = £ x iVj p ( x = x i , Y = yj) = £ x iy 3 P( x i)P(yj) = J2 x i x iP( x i)yiP(yi ) 
i,j i,j i,j 

= Y, x iP( x i)Y,yi p (yi) = E (^) E (^) 

i,j i,j 


So the mean of the product of independent random variables is always equal to the product of 
their means. 

We will assume as obvious that these four properties extend to the continuous case! 

Definition (#91): After having translated the trend by the mean it is interesting to have and 
indicator that reflects the dispersion or "standard deviation" around the mean by a value named 
"variance of X" or "second-order centered moment" or "mean square error (MSE), written 
V (A") or o\ (read "sigma square") and given in its discrete form by: 

4 = V(X) = MSE = E [(X - px) 2 ] = J2( x i - yx ) 2 /4) = £4 - yx) 2 Pi (7.89) 

i i 

The variance is however not directly comparable to the mean because of the fact that the units 
of the variance are the square of the unit of the random variable, which follows directly from its 
definition. To have an indicator of dispersion that can be compared to the parameters of central 
tendency (mean, median and ... mode), it then suffices to take the square root of the variance. 

For convenience, we define the "standard deviation" of X by: 

^ = a(X) = yV(Xj 



Rl. The standard deviation a x of the random variable X can be written simple a if there 
is no possible confusion. 

R2. The standard deviation and variance are, in the literature, often named "dis- 
persion parameters" as opposed to the mean, mode and median that are named 
"positional parameters". 

V W 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Definition (#92): The ratio (expressed in %): 


is often used in business to compare the mean and the standard deviation and is named the "co- 
efficient of variation C.V." because it has no units (which is it’s main advantage!) and because 
many industrial statistical methods consider that a good C.V should ideally be just about a few 
% only. 

More generally for any statistics estimator 6 (sum, average, median, etc.) we can build a coeffi- 
cient of variation such that: 

CV = (7.92) 


Thus, in practice we consider that: 

Coefficient of variation 











World Class 


Rarely achieved 

Table 4.23 - Qualitative judgments of C.Vs commonly accepted 

Why do we find a square (respectively a square root) in the definition of the variance? The 
intuitive reason is simple (the rigorous much less ...). Remember, that we have shown above 
that the sum of the deviations from the actual weighted average is always zero: 


Y / n l (x i - f i) = 0 (7.93) 

i= 1 

If we assimilate the size of each sample by the probability by normalizing the sample size with 
respect to n, we come upon a relation that is the same as the variance with the difference that 
the term in brackets is not squared. And then we immediately see the problem... the dispersion 
measure is always zero, hence the need to bring this to the square. 

We could, however, imagine to use the absolute value of deviations from the mean, but for a 
number of reasons that we will see later during our study of estimators, the choice of squaring 
is quite natural. 

Note, however, the common use in the industry of two common other indicators of dispersion: 

1 . "The mean absolute deviation" (mean of the absolute values of deviations from the mean): 


H K - ^ I 

AVEDEV = — (7.94) 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Which is a very elementary indicator used when we do not want to make statistical in- 
ference on a series of measures. This deviation can be easily calculated in the English 
version of Microsoft Excel 1 1.8346 using the AVEDEV ( ) function. 

2. The "median absolute deviation" denoted MAD (median of absolute values of deviations 
from the median): 

MAD = M e ( \X - M e (X) |) (7.95) 

which is considered as a more robust measure of dispersion than those given by the mean 
absolute deviation or the standard deviation (unfortunately this indicator is not natively 
integrated in spreadsheets softwares). 


Consider the following measure of a random variable X: 

(1,1, 2, 2, 4, 6, 9) 


and where the median value is given as we know by: 

M e (X) = M e ( 1,1, 2, 2, 4, 6, 9) = 2 


The absolute deviations from the median are then: 

| X — M e (x) |= (1,1, 0,0, 2, 4, 7) 


Placed in ascending order, we then have: 

(0,0, 1,1, 2, 4, 7) 


where we easily identify the absolute deviation from the median, which is: 

MAD = M e ( 0, 0, 1, 1, 2, 4, 7) = 1 


In the case where we have at disposition a series of measures, we can estimate the experimental 
value of the mean (expectation) and of the variance with the following estimators (it is simply 
the of average and standard deviation of a sample when the events are equally likely) with the 
specific notation: 

n i n 

fi = j2 x i and - A) 2 

i=l n i = 1 

Proof 4.28.4. First for the mean: 

n n i i n 

/i = Epf) = J2Pi X i = ~ Xi = ~J2 Xi = P 

i=l n i n 


i = 1 



info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

And for the variance: 

a 2 = E((A' - E(A)) 2 ) = E((A - fi) 2 ) = , - a) 2 

2= 1 

= i £(.*,- A ) 2 =<J 2 

i= 1 



□ Q.E.D. 

Theorem 4.29. Let us prove now a very nice little property as the arithmetic average is an 
optimum for the sum of squared errors. 

Proof 4.29.1. 

- a) 2 = Y x i -2 aJ2 'Xi + 


i= 1 



And if we search for a as the derivative of the above expression is equal to zero: 


— ( Y x 2 l -2aY Xi + na 2 ) = 0 

d « \h h ) 

then a is an optimum. We have therefore: 


Y, — 2a Yj x i + nQf2 ) = — 2 Y^ x i + 2na = 0 

\i = 1 2=1 / 2=1 

or after rearrangement and an elementary simplification we get: 






i = 1 


□ Q.E.D. 

It is effectively the arithmetic average! Now to see if it is an maximum extrema or minimum 
extrema we just need calculate the second derivative (see section Differential and Integral Cal- 
culus) and see if it gives a positive constant (i.e. the first derivative increases when a increase). 
Therefore we immediately see that it is effectively a minimum extrema! ! ! 


The term of the sum that we see in the expression of the variance (standard deviation) is 
named the "sum of squared deviations from the mean" or "sum of squared errors from 
the mean". We also name it the "total sum of squares", or "total variation", or "sum of 
square errors" in the context of the study of the ANOVA (see the further below) 

V i 

Before that we continue, let us recall the concept of geometric mean seen above (widely used 
for returns in finance or growth analyzes in % of sales): 

l l g 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

It’s fine but employees in financial departments also need to calculate the standard deviation of 
this average. The idea is then to take the logarithm to reduce it to a simple arithmetic mean (it 
is still obviously an estimator!): 

ln(/i 3 ) = In 



E ln 4) 

i = 1 


Therefore, since taking the logarithm of the values we have the arithmetic mean of the log 
values, then the logarithm of the geometric standard deviation (with physicist reasoning like...) 
will be: 

In (&g) = J - E ( ln 04 - ln (/Ay)) = \/-E ln ( — ) (7.110) 

V n i=! y n i= i V/W 

Then we just take the exponential of the standard deviation of the logarithms of the values to 
have the "geometric standard deviation": 


(J a = e 

E ln 

i= 1 




The variance can also be written in the very important way named the "Huygens relation" or 
"Konig-Huygens theorem" or "Steiner translation theorem" that we will reuse several times 
thereafter. Let’s see what it is: 

V(X) = E[(X - fix) 2 } = E<E “ = J2 ( X 1 ~ + fix) f(xi ) 

i i 

= E x 1 f( x i) - 2 ^x E x if( x i) + ^E /( x *) = E x lf( x i) - 2 /4 + A (7.H2) 

i i i i 

= E 4/fe) - tA = E(X 2 ) - & = E(X 2 ) - E(Xf 

Let us now do a relatively small hook to a common scenario generator of errors in business when 
several statistical series are handled (very common case in the industry as well as in insurance 
or finance) ! 

Consider two data series on the same character: 

• (xi, rii), (x 2 , n 2 ), ..., (x p , rip) sample of total size n, arithmetic average x, standard devi- 
ation a x . 

• (j/i, mi), (y 2 , m 2 ), ..., (y p , m q ) sample of total size m, arithmetic average y, standard de- 
viation a y. 

We then have: 

p g 

E x i + E Vi 

2=1 2=1 

n + m 

1 p 

n— E x i + 




nx + my 

n + m 

n + m 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

So the average of the averages is not equal to the overall average (first common mistake in 
business) except if the two data series have the same sample size (n = m )\ ! ! 

Let have a look at the standard deviation always with the same situation. First remember that 
we have: 

a l = J2P*( X i ~ X ) 2 = ~J2 n i(Xi ~X) 2 = ^ 

n ■ 1 i—*i= 1 n i i = 1 

X n /( x i - x) 



1 9 

— X ! - y)‘ 






To continue, recall that we have previously proved the Huygens theorem and therefore: 

1 /c 

V(Z) = a, = E(Z) - E(Z) 2 = k iZ 2 - z 2 



Therefore we have: 


X n i x i + X m <y 2 i - 

2 f nx + my \ 2 
n + m ’ 
2 \ / nx + my 

n + m VS [ i = i 

n + m 


- X ^iX 2 + m— X wiiJ/i 





nx + mt/ \ 2 

n + rn \ n + m ' 

1 p 1 

-L X — x 9 _9 -L 

n — X — nx 2 + m — X — my 2 


771 r=i 

nx + my 2 / nr + my \ 

n + m 


n + m 

,<,-7 \ 2 

n + m 

7 (7.116) 


- X + m (— X "fi// 2 - y 2 


i= 1 



n + m 


nx + my 2 /nr + my \ 

n + m 

n + m ' 

na 2 + mcx 2 nr + my 2 / nx + my \ 2 

n + m n + m V n + m 7 

na 2 + m<7 2 (nr 2 + my 2 )(n + m) — (nr + my 2 ) 

n + m 


(n + m) 2 

na 2 + mcr 2 (n 2 x 2 + mny 2 + mnx 2 + m 2 y 2 ) — (n 2 x 2 + ‘Inxmy + m 2 y 2 ) 


n + m 

And we continue on the next page...: 

(n + m) 2 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

2 na 2 + ma 2 mny 2 + mnx 2 — 2 nxmy 

= n + m + 
ncx 2 + mcr 2 
n + m 

+ nm 

(n + m) 2 

(£ + ^) 2 

(n + m) 2 

ncx 2 + mcr 2 y 2 + x 2 - 2 xy 
+ nm- 

n + m 

{n + m ) 2 


So we see that the overall standard deviation is not equal to the sum of the deviations (second 
common mistake in business) unless the sample sizes and arithmetic averages are the same in 
both series (that is to say n = m and x = y ) ! ! ! 

Consider now X being a random variable of mean /j (constant and determined value) and vari- 
ance a 2 (constant and determined value), we define the "reduced centered variable" by the 


X- y 


Theorem 4.30. We prove in a very simple way by using the property of linearity of the mean 
and property of scalar multiplication of the variance that: 

E(Y) = 0, V (Y) = 1 


Proof 4.30.1. For the proof we just use the definitions of the expected mean and variance (using 
Huygens theorem for this latter). So let us begin with the mean: 

E(Y) = E ( = -E (. X -») = -(E(X) - E(/i)) = -(/* - p) = 0 = fiy (7.120) 
V a ) a a a 

And now with the variance using the Huygens theorem: 

V(F) = E(Y 2 ) - E(Y) 2 = E(Y 2 ) - /j 2 , = E(Y 2 ) - 0 2 = E(Y 2 ) 

= E ((^T 1 )") = E (X 2 - 2 A> + A 2 )) = li (E(A' 2 ) - 2/tE(X) + „ 2 ) 

= ( E (X 2 > ~ 2 W‘ + M 2 ) = 4 <E(A' 2 ) - m 2 ) = ((V(A') + E( A) 2 ) - M 2 ) 

= li (V(A) + f - M 2 ) = 4v(X) = ^ = 1 


□ Q.E.D. 

Thus, any statistical distribution defined by a mean and standard deviation can be transformed 
into another distribution often easier to analyze statistical. Therefore making this transforma- 
tion, we obtain a random variable for which the parameters of the distribution low are now 
useless to know. When we do that with other laws, and in the general case, when we speak of 
"pivotal variables". 

Here are some important mathematical properties of the variance: 

PI . Multiplication by a constant: 

V(aX) = f(xi)(axi - ay) 2 = a 2 f( x i)( x i ~ vY = « 2y (7.122) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

P2. Sum of two random variables: 

V (X + Y) = E (( x * + Vi) - (vx + / i >')) 2 f( x i, Vi) 

* 3 

= EE (fa* - Vx) + ( Vi - Vy)) 2 /fai, Vi) 
i 3 

= EE(( x i — ^x) 2 + 2(Xj — /ix) fai — AT ) + fai — /A") 2 ) / fai> Vi) 

* j 

= £}(xi- /Ar) 2 /fai, Vi) + 2Y^{xi - Hx){yi ~ PY)f{x%, Vi) 
i,j M 

+ T,(yi- HY) 2 f( x i,yt) 

= V(X) + 2^1 fa; - /Ac) fa* _ /w)/fai,S/i) + V(Y) 

= V(X) + 2E [(X - /i X )(F - /r)] + W{Y) 

:= V(X) + 2cov(X, Y) + V(X) 


Where we meet for the first time the concept of "covariance" denoted by cov(). 

P4. Product of two random variables (using the Huyghens theorem): 

V(X ■ Y) = E ((AT) 2 ) - E(AT) 2 = E(X 2 T 2 ) - E (XT) 2 (7.124) 

And if the two random variables are independent, we get: 

V(X ■ Y) = E(X 2 )E(T 2 ) -E(X) 2 E(T) 2 (7.125) 

What we can rewrite using once again the Huygens theorem: 

V(X • Y) = (V(X) - E(X) 2 )(V(T) - E(T) 2 ) - E (X) 2 E(T) 2 (7.126) 

We will assume as obvious that these four properties extend to the continuous case! 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

7.3. 1.2 Discrete Covariance 

We have seen in on of the last equations the concept of "covariance" for which we will 
determine a more convenient expression later: 

cov(X, Y) = c x , y = 2E [pQ - ,i x ){Y - /jP] 


We introduce now a more general and very important expression of the covariance in many 
application fields: 

Vpf + Y + Z) = XXX(0u + %' + z k) - (/Ur + /W + Vz)) 2 f(xi,yj,z k ) 

i j k 

= X X XX _ flx) + (yj - Hy) + (z k - Vj, z k ) 

i j k 

= E£E (A + B + c) 2 f( Xi , Vl ,^) 

i j k 

= X X X(^ 2 + B 2 + C 2 + 2 AB + 2 BC + 2 AC)f(x h y 3 , z k ) 

i j k 

= V(X) + V(y) + V(Z) + 2 X X + bc + A C)f(xi, Vj , Zk) 

i j k 

= V(X) + V(y) + V(Z) + 2cov(X, Y) + 2cov(y, Z) + 2cov(X, Z) 


Now we change the notation to simplify even more: 

V(Xi + X 2 + X 3 ) = V(Xp + V(X 2 ) + V(X 3 ) + 2cov(X 1} X 2 ) + 2cov(X 2 , X 3 ) + 2cov(X 1 , X 3 ) 

3 3 

= X V (^*) 2 X cov(Xj, Xj) 

i = 1 i<j 


Therefore in the general case: 

Or using standard deviation: 


+2j2cov(X i ,X j ) 


i= 1 


Using the properties of the mean (especially E(X) = c te and E(c te ) = c te ) we can write the 
covariance in a much simpler way for calculation purposes: 

covpf, y) = e [(x - E(x))(y - E(y))] = e [xy - E(x)y - xE(y) + E(x)E(y)] 

= E(xy) - E(E(x)y) - E(XE(y)) + e(e(x)e(x)) 

= E(xy) - E(x)E(y) - E(x)E(y) + e(x)e(f) 

= E(xy) - E(x)E(y) 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

and we obtain the relation widely used in statistics and finance in the practice called the "co- 
variance formula"...: 

c- v ,y = cov(X, Y) = E (XY) - E(X)E(F) 


which is however best known when written as: 

Cx,y = cov(X, Y ) = - V x t y t - xy 


If X = Y (equivalent to a univariate covariance) we fall back again on the Huyghens theorem: 

c AVY = cov(X, X) = E(XX) - E(X)E(X) = E(X 1 2 ) - E(X) 2 



Statistics can be partitioned according to the number of random variables we study. Thus, 
when a single random variable is studied, we speak of "univariate statistics", for two 
random variables of "bivariate statistics" and in general, of "multivariate statistics". 

V / 

If and only if the variables are equally likely, we find the covariance in the literature in the 
following form, sometimes named "Pearson covariance", which derives from calculations that 
we have done previously with the mean: 

1 n 

c x,Y = -5^(Vi - yv)(xi - Hx) (7.136) 

n U 

Covariance is a measure of the simultaneous variation of X and Y. Indeed, if X and Y generally 
grow simultaneously, the products (y { — /xy)(.x,; — nx) will be positive (positively correlated), 
whereas if Y decreases as X increases, these same products will be negative (negative correla- 
tion). Note that if we distribute the terms of the last equation, we have: 

1 n 

C x,Y = cov(X, Y) = y'fy,; - Hv){xi - y x ) 

n fYi 

1 n 

-y)( x i-x) 


i = 1 

1 n 

- Y, x i(vi ~y)~ x (vi - y) 


5 2 x i(vi-y ) ~ x Y^(yi 

\i = 1 



and we have already shown that the sum of the deviations from the mean is zero. Hence we get 
another common way to write the covariance: 

1 n 

C x,Y = cov(X, Y) = - Y^Xi(yi - y) (7.138) 

n Xt 

and by symmetry: 

1 n 

C x,Y = cov(X, Y) = - yi( x i - x ) (7.139) 

n i = i 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

So in the end, in the equiprobable case, we finally have the equivalent three important relations 
used in various sections of this book: 


In the section Theoretical Computing for the study of linear regression and factor analysis we 
will need the explicit expression of the bilinearity property of the variance. To see what it is 
exactly, consider three random variables X , Y and Z and two constants a and b. Then using the 
third relation given above, we have: 

cov (Y, aX + bZ) — — YiVi ~ y){axi + bz t ) = ~Y [(Vi ~ y) ax ^ + (Vi ~ y) hz i\ 





~ y) aXi + H(vi ~ y) hz 


= a ~ Y{yi - y) x i + b ~ 1 Z(vi ~ y)*i 

n i n i 

= acov(A", Y) + 6cov(Y, Z) 

The last relation is also important and will be used in several sections of this book (Economy, 
Numerical Methods). It also allows us to directly obtain the covariance for the sums of various 
random variables. 


If A", Y, Z, T are four random variables defined on the same population, we want 
to compute the following covariance: 

cov(3A + 5Y,4Z-2T) 


We will develop that in two phases (this is also why we call that "bilinearity property"). 
First with respect to the second argument (random choice!): 

cov (3 A + 5Y, 4Z - 2 T) = 4cov(3A + 5 Y, Z) - 2cov(3A + 5Y, T) (7.143) 

And then with respect to the first: 

cov(3A" + 5Y, 4 Z — 2 T) = 4 [3cov(A", Z) + 5cov(Y, Z)\ — 2 [3cov(A, T) + 5cov(F, T)] 


So in the end: 

cov(3A + 5Y, 4 Z — 2 T) = 12cov(A", Z) + 20cov(Y, Z) — 6cov(A", T) — 10cov(Y, T) 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Now, consider a set of random vectors X t := X ?: of components (aq, x 2 , x n )i. The calcula- 
tion of the covariance of the components by pairs gives what is called the "covariance matrix" 
(a tool widely used in finance, management and statistical numerical methods!). 

Indeed, we define the component (m, n ) of the covariance matrix by: 

CO\ Xm,X n E [(X m fJ, Xm )(X n ■ Cm,n (7.146) 

We can therefore write a symmetric matrix (usually in practice it must be a square matrix...) in 
the form: 

Cll C12 
C 21 C 2 2 

-Cnl Cn2 ' ' ' C nn _ 

where £ is the usual tradition letter to denote the covariance matrix. By symmetry and because 

n(n + 1) 

it is a square n by n matrix only the number of components is useful for us to deter- 

mine the whole matrix (trivial but important information for when we will study the structural 
equation modeling in the Numerical Methods section). 

Cl n 
C 2 n 


This matrix has the remarkable property that if we take the set of all random vectors and we 
calculate the covariance matrix, then the diagonal will give us obviously the variances of each 
pair of vectors (see examples in the chapters Economics, Numerical Methods or Industrial En- 
gineering) because we have for recall: 

cov Xm ,x m = E [{X m - iix m ){X n - fi Xm )\ = E [(xj - Hx m ) 2 ] = V(X m ) = a Xm (7.148) 
This is why this matrix is often named "variance-covariance matrices" and finds itself sometimes 

also written as follows: 

£ = 

And this is a little bit abusively sometimes written as: 


0's n 

£ = 

'V u 

Cl2 ' 



Cl2 • 

^1 n 

C 2 1 

V 22 ■ 

02 n 


C 2 1 

^22 ' 


_ Cnl 


. V 

v nnj 




G nn- 



P'nl &n2 





This matrix has the advantage of quickly showing what pairs of random variables have a nega- 
tive covariance and there... for which random variable the variance of the sum is smaller than 
the sum of the variances ! 


As we already mention it, this matrix is very important and we will often see it again in 
the section Economy during our study of modem portfolio theory and also for data mining 
techniques in the section of Theoretical Computing (principal compoments analysis for 
example but not only!) and also in Industrial Engineering during our study of bivariate 
control charts. 

V J 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Recall now that we have an axiom in probability (see section Probabilities) which stated that 
two events A and B are independent if and only if: 

P(A flS) = P(A)P(B) (7.151) 

Similarly, by extension, we define the independence of discrete random variables. 

Definition (#93): Let X, Y be two discrete random variables. We say that X, Y are independent 
if and only if: 

Vx, 1/68 P(X = x , 1 " = y) = P{X = x)P(Y = y ) (7.152) 

More generally, the discrete variables X 1: X 2 , ..., X n are independent (in block) if: 


Vxi,...,x n eR P(x 1 = Xi,x 2 = x 2 , ...,X 3 = x 3 ) = Y[P(Xi = Xi) (7.153) 


Theorem 4.31. The independence of two random variables implies that their covariance is zero 
(the opposite is false!). 

Proof 4.31.1. We will prove this in the case where the random variables take only a finite 
number of values {x, } r and { ijj } ; , respectively, with I, J finite sets. 

For the proof let us recall that: 

E (XY) = P(X = x h Y = yj )x iyj = Y, P{* = Xi)P{Y = yj )x iyj 
= E P(X = x,)x, E P(Y = yJVj = E (X)E(y) 

* 3 

and therefore: 

Cx,y = E(XY) - E(X)E(y) = EpQE(y) - E(X)E(1") = 0 


So small is the covariance (near to zero), more the series are independent. Conversely, 
the greater the covariance (in absolute value) higher the series are dependant. 

V / 



Given that: 

V(X + Y) = V(X) + V ( y ) + 2cx,y 

and the fact that if X and Y are independent we have cx,y = 0. Then: 

v(x + y) = v(x) + v(y) 



More generally if X 1} .... X n are independent (in block) then for any discrete or continuous 
statistical distribution law (!) we have using the two most common notations: 

V Ex, =E V (-V) 

\i = 1 


Or using the standard deviation: 

= WE 






□ Q.E.D. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 Anscombe’s famous quartet 

Anscombe’s quartet comprises four datasets that have nearly identical elementary statis- 
tical properties, yet appear very different when graphed or analyzed with undergraduate 
statistics rather than high-school one. Each dataset consists of eleven (x, y) points. They were 
constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance 
of graphing data before analyzing it and the effect of outliers on statistical properties. This 
quartet is also used to test if an analytical tool can be accepted a "statistics-compliant" (as the 
six corresponding used statistics should be the minimum provided by any high-school level 
analytical tool!). 

The datasets are as follows. The x values are the same for the first three datasets: 





































































































Table 4.24 - Anscombe’s quartet 

The quartet is still often used to illustrate the importance of looking at a set of data graphically 
before starting to analyze according to a particular type of relation, and the inadequacy of basic 
statistic properties for describing realistic datasets. 

With Microsoft Excel 14.0.7166 we get: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 





x y 

* y 

x y 

x y 






















































































































- 0.0650 

- 0.0000 

- 1.3158 

- 0.0000 [ 






- 0.5349 

- 1.2000 


- 1.2000 









Figure 4.74 - Anseombe’s quartet Statistics Summary 

As we can see with elementary statistical indicators it is almost impossible to guess a difference 
between the four data sets. But if we use the skewness or the kurtosis this change everything! 

Looking to the corresponding charts we get the same conclusion: 

Figure 4.75 - Anscombe’s quartet Graphs Summary 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 Mean and Variance of the Average 

Often in statistics, it is (verrrrry!) useful to determine the standard deviation of the 
sample mean and to work with it to get important analytical results in management and 
manufacturing. Let’s see what it is! 

Given the average of a series of terms, each determined by the measurement of several values 
(it is in fact its estimator in a particular case as we will see later): 

x = - (X 1 + X 2 + ... + X n ) (7.160) 


then using the properties of the mean: 

E (X) = i (E(X x ) + E(X 2 ) + ... + E(X n )) (7.161) 

and if all the random variables are independent and identically distributed then we have: 

E (A) = — (fjj + fi + ... + fi) — — nfi = /I (7.162) 

v ' n n 


We will prove much further below that if all the random variables are independent and 
identically distributed with finite variance, then the mean follows asymptotically what 
we name a "Normal distribution". 

For the variance, the same reasoning applies: 

V(X) = 4 = ^ (V(X0 + V(X 2 ) + ... + V(X n )) = ^ (ai + (j\ + ... + cr 2 n ) (7.163) 

And if the random variables are independent and identically distributed (we will study further 
the very important case current in practice where the last condition is not satisfied): 

v® = A 4 


Then we get the "standard deviation of the mean" also named "standard error" or "non- 
systematic variation": 

and this is strictly the standard deviation of the estimator of mean! 


The more intuitive form to express the Standard Error in terms of percent for non- analytical 
workers, managers and chief executives is named "Relative Standard Error (RSE)" which is the 
expression of the Standard Error as percent, that is: 

RSE = ^ 




info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

The latter is quite useful when we have to deal with many variables with different units!! 

The value of ax is available in many softwares including Microsoft Excel charts (but there is 
no built-in function in Microsoft Excel) and is written with the standard deviation (as above) or 
with the notation of the variance (then we only have to take the square root...). 

Note that the last relation can be used even if the average of n random variables is not the same! 
The main condition is just that the standard deviations are all equal and this is the case in the 
industry (production). 

We then have: 

E (S n ) = nfi E (M n ) = fj, 


v (Sn) = a 2 Sn = no 1 V(M„) = a 2 Mn = ^ 


where S n is the sum of n independent identically distributed random variables and M n their 
estimated average. 

The reduced centered variable that we introduced earlier: 

Y = 7^^ (7.168) 


can then be written in several very useful ways: 

S n - n[i M n - n 

“ <75 “ <Wn/H (7 ' 169) 

V / 

Furthermore, assuming that the reader already knows what is a Normal distribution a), we 
will show later in detail because it is extremely important (!) that the probability of the random 
variable X, average of n identically distributed and linearly independent random variables, has 
for law (obviously): 

a r 




info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 Coefficient of Correlation 

Now consider X and Y two random variables having for covariance: 

cx,y = E [(26 — jdx)(Y — Hy)\ 


Theorem 4.32. We have: 

{c xy f < V(X)V(Y) (7.172) 

We will prove this relation immediately because the use of the covariance alone for data analysis 
is not always great because it is not strictly limited and easy to use (at interpretation). We will 
construct an indicator easier to use in business. 

Proof 4.32.1. We choose any constant a and we calculate the variance of: 

aX + Y (7.173) 

We can then immediately write using the properties of the variance and the of the mean: 

W(aX + Y) = a 2 V(X) + V(F) + 2 ac x , Y (7.174) 

The right quantity is positive or null for any a by construction of the variance (left). So the 
discriminant of the expression, seen as a polynomial in a is of the type: 

P(x) = ax 2 + bx + c = a 

X+ 2~a 

b 2 — 4ac 
4a 2 

P(a) = Va 2 + 2 c x , Y a + V(Y') = V(.A) 

a + 

2cx,y V (2cx,y) 2 - 4V(X)V(Y) 





Because P(a ) is positive for any a we have as only possibility that: 

(2 c A ',y) 2 - 4V(X)V(F) < 0 (7.176) 

Therefore after simplification: 

(cx,y) 2 < V(X)V(y) (7.177) 

□ Q.E.D. 

This gives us also: 

( c x , y )‘‘ < , ^ l c A-,y| , 

v(A')v(y) - yv(X)v(y) “ 

Finally we get some a statistical inequality named "Cauchy-Schwarz inequality": 

-i< , Cx ’ y <i 

\/V( A')V(F) 




info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

If the variances of X and Y are non-zero, the correlation between X and Y is defined by the 
"linear correlation coefficient" (it is a standardized covariance so that its amplitude does not 
depend on the chosen unit measure) and written: 




Which can also be written in an expanded form (using Huyghens theorem): 

E (XY) - E(X)E(F) _ E (XY) - E(X)E(F) 

Vv(x)v(y) " V(E(A^ 2 ) - E(X)2) (E(y 2 ) - E(y)2) 


or more condensed: 

Rx.y = 





Note that normally, the letter R is reserved to say that this is an estimator of the correlation 
coefficient but the definition above is not an estimator (the variances doesn’t have the 
small hat...) and that, strictly speaking, we should then write pxy according to the 
traditions of use. 

\ / 

Whatever the units and the orders of magnitude, the correlation coefficient is a number between 
— 1 and 1 without units (so its value does not depend on the unit of measure, which is by far not 
the case for all statistical indicators!). It reflects more or less the linear dependence of X and 
y or geometrically more or less the flattness magnitude. We can therefore say that a coefficient 
of correlation of zero or close to 0 correlation means that there is no linear relation between the 
characters. But it does not involve any notion of more general independence. 

When the correlation coefficient is near 1 or —1, the characters are said to be strongly correlated. 
We must be careful with the frequent confusion between correlation and causality. Thus, two 
phenomena that are correlated does not imply in any way that one is the cause of the other. 

Indeed, for any two correlated events, A and B , the different possible relationships include: 

• A causes B (direct causation); 

• B causes A (reverse causation); 

• A and B are consequences of a common cause, but do not cause each other; 

• A causes B and B causes A (bidirectional or cyclic causation); 

• A causes C which causes B (indirect causation); 

• There is no connection between A and £>; 

• The correlation is a coincidence. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Coming back to the mathematical aspect of the correlation: 

• If Rx.y = — 1 we are dealing with a "pure negative correlation" (in the case of a linear 
relation all measurement points are located on a straight line with a negative slope). 

• If — 1 < Rx,y < 1 we are dealing with a negative or positive correlation named "imper- 
fect correlation" (in the case of a linear relation all measurement points are located on a 
straight positive or negative slope respectively). 

• If R.x.y = 0 the correlation is zero... (in the case of a linear relation all the measurement 
points are located on a straight line of slope zero). 

• If Rx,y = 1 we are dealing with a "pure positive correlation" (in the case of a linear 
relation all measurement points are located on a straight positive slope). 

The analysis of the correlation coefficient has the objective of determining the degree of as- 
sociation between variables: it is often expressed as the coefficient of determination, which is 
the square of the correlation coefficient. The coefficient of determination thus measures the 
contribution of a variable to the explanation of the second. 

Using the expressions of mean and standard deviation of equiprobable variables as demonstrated 
above (thus the idea of computing the correlation of two random variables is a good idea if they 
are jointly gaussian), we start: 

Rx.y = 

E(AT) - E(X)E(Y) 

^(E(X 2 ) - E(X) 2 ) (E(Y'2) e(F)2) 

To obtain the estimator of the coefficient of correlation 


1 n 

X M 

Rx,y = 


i — 1 

l " 

l " N 






i= 1 

in 1 / n \ 

Ki= 1 

n U i 


\i = 1 


where we see that the covariance becomes the average of the products minus the product of 

Thus after simplification we get a famous expression: 


The correlation coefficient can be calculated in the English version of Microsoft Excel 1 1.8346 
and others with the integrated CORREL ( ) function. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

We will see in the section Theoretical Computing a more general expression of the correlation 


Rl. In the literature, the experimental correlation coefficient is often named "sampling 
Pearson coefficient" (in the equiprobable case) and when we carry it to the square, then 
we name it the "coefficient of determination". 

R2. Often the square of the coefficient is somewhat improperly interpreted as the % of 
variation explained in the response variable Y by the explanatory variable X. 

v I ! I J 

Finally, note that we have the following relation which is used a lot in practice (see the section 
Economics for famous detailed examples!): 

V(X + Y ) = V(X) + V(y) + 2c X) y = V(X) + V(Y) + 2 R x>Yy /V(X)V(Y) (7.186) 

or the version with the standard deviation even more famous: 

&X+Y ~ 


<Jx + a Y + 2 Rx,y&x&y 


It is a relation that we can often see in finance in the calculation of the VaR (Value at Risk) 
according to RiskMetrics methodology proposed by JP Morgan (see section Economy). 

Let us see a small application example of the correlation but that has nothing to do with VaR (at 
least for the moment...). 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


An airline company has 120 seats available that she reserves for connecting pas- 
sengers from two flights arrived earlier in the journey and that have to go to Frankfurt. 
The first flight arrived from Manila and the number of passengers on board follows a 
Normal distribution with mean 50 and variance 169. The second flight arrives in Taipei 
and the number of passengers on board follows a Normal distribution with mean 45 and 
variance 196. 

The linear correlation coefficient between the number of passengers of both flights was 
measured as: 

R x ,y = 0.5 (7.188) 

The law that follows the number of passengers for Frankfurt if we assume that the law of 
the couple also follows a Normal distribution (according to statement!) is: 

X + Y = J\f (hy + fJ>x, &x+y) (7.189) 


Hy + Hx = 50 + 45 = 95 

!— - (7.190) 

(JX+Y = V a X + a Y + 2 Rx,Y a X&Y 

Rx,Y = 0.5 (7.191) 

The law that follows the number of passengers for Frankfurt if we assume that the law of 
the couple also follows a Normal distribution (according to statement!) is: 

X + Y = J\f (jj.y + H: v, ctx+y ) (7.192) 


Hy + l^x = 50 + 45 = 95 

a x+Y = \fo\ + + ‘IRx.yOxOy = \/l69 + 196 + 2 • 0.5-/L69 • 196 ^ 23.38 


This is a bad start for customer satisfaction in the long term... 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

7.3.2 Continuous Variables and Moments 

Definitions (#94): 

1. We say that X is a continuous variable if its "cumulative distribution function C.D.F." 
is continuous (already defined above). The distribution function of X is defined by for 
x € M or a truncated subset of M: 

F(x) = P{X < x ) (7.194) 

that is the cumulative probability that the random variable X is smaller than or equal to 
the set value x. We also have of course: 

0 < P(X) < 1 (7.195) 

2. We denote by: 

G(x) = 1 - F(x) = P{X > x ) 

the "survival function" or "tail function". 


3. If furthermore the distribution function F of X is continuously differentiable of derivative 
/ (or sometimes denoted by p) named "density function" or "mass function" or just simply 
"distribution function" then we say that X is absolutely continuous and in this case we 

P(x i < X < x 2 ) = j f(x)dx = F(x 2 ) — F(x 1 ) (7.197) 


with the normalization condition: 

P(X < - 00 ) 


J f(x) dx 

d F(x) = 1 

— OO 

— OO 


Any probability distribution function must satisfy the integral of normalization in its do- 
main of definition! 


It is interesting to note that the definition implies that the probability that a completely 
continuous random variable takes a given value tends to zero! So it is not because an 
event has almost a zero probability that it can not happen! ! ! 

V / 

The average being defined by a sum weighted by probabilities for a discrete variable, it becomes 
an integral for a continuous variable: 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

and therefore the variance is written as: 


V(X) = j [x -E(X)] 2 f(x)dx 

— OO 


Then we have also the median that is logically redefined in the case of a continuous random 
variable by: 

and it rarely coincides with the average! 

And the modal value is given by the value of x where: 

d f{x) = Q 



Statisticians often use the following notations for the expected mean of a continuous variable: 

E {X),M{X),ti X ,p (7.203) 

and for the variance: 

V(X),S(X),a 2 x ,a 2 (7.204) 

That is the same as for the moment of discrete variable. 

Thereafter, we will calculate these different moments indicators with detailed proofs only for 
the most used cases. 

7.4 Fundamental postulate of statistics 

One of the ultimate goals of statistics is, starting from a sample, to find the analytical distribution 
function that gave birth to the sample. This goal will be presented on this web site as a postulate 
(although this assumption is very difficult to apply in practice). 

Postulate: For any empirical distribution function F n (x) of the n-th measurement of the x 
random variable we can associate a theoretical distribution function F(x) to which it converges 
when the sample size is large enough if: 

X n = sup | F n (x) - F(x ) | (7.205) 

is the random variable defined as the largest difference (in absolute value) between F n (x) and 
F(x) (observed for all values of x for a given sample), then X n converges to 0 almost surely. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


Mathematicians of Statistics prove this postulate rigorously as a theorem named the "fun- 
damental theorem of statistics" or the "Glivenko-Cantelli theorem" regarding continuous 
functions. Personally, even if we offends the experts, we think that this proof is not one 
because because it is very far away from the practical reality (yes this is our physicist 
side that emerges...) and this theoretical result leads many practitioners do their utmost 
(excluding data, transformations and other abominations) to find a known distribution 
law that they can adjust to their measured data. 

V / 

7.5 Diversity Index 

It happens in the field of biology or business that you it is asked to a statistician or analyst to 
measure the diversity of a number of predefined elements. For example, imagine a multinational 
with a range of well-defined products and some of the stores (customers) in the world can choose 
a subset of this range for their business sales. The request is then to make a ranking of stores 
that sell the widest range of branded products and that by taking also into account the quantity. 

For example, we have a list of a total 4 products in our catalog. By hazard, three of our cus- 
tomers sell our 4 products but we would like to know which customers sells the greatest diversity 
and this by taking into account the quantities. 

We have the following sales data by product for the customer 1: 

Customer 1 

Product 1 


Product 2 


Product 3 


Product 4 


For the customer 2: 

Customer 1 

Product 1 


Product 2 


Product 3 


Product 4 


and for the customer 3: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Customer 1 

Product 1 


Product 2 


Product 3 


Product 4 


A measure of information (diversity of states) that is well suited to this purpose is the Shannon 
formula introduced in the section of Statistical Mechanics whose mean: 


S(x) = E (h(x)) = -A 5> t \og( Pi ) (7.206) 


Arbitrarily, we will take and the logarithm in base 10 (so, if we have 10 equiprobable variables, 
entropy is unitary for example...). 

Therefore we have: 


s ( x ) = -J2pi lo &w(Pi) (7.207) 


We will rewrite this more adequately for the application in business. Thus, if n is the number 
of products and p, the proportion (or "relative frequency") of sales of product i from all sales N 

§ (7.208) 

Then we have: 

nr r 1 72 

S(x) = - E lo Sio(^) = [ lo §i0 (ft) - login (N)} 

n n n 

Y, fi [ 1 °Slo( iV ) - logio(/i)] 1 °glo( iV ) E/i-E fi lo gl0 (/«) 

_ i^i (7.209) 

N N 


N\og m (N) - E/* lo §io (/0 



This gives for the customer 1 (we stay in base 10 for the logarithm): 

N log(iV) - E fi log (fi) 

20 log(20) - (5 log(5) + 5 log(5) + 5 log(5) + 5 log(5)) 


20 log(20) — 20 log(5) 

log(20) - log(5) = log(4) = 0.602 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

which is the maximum possible value (each state is equally likely). And for customer 2 we 

N log (N) log(/i) 



20 log(20) - (1 log(l) + 1 log(l) + 1 log(l) + 17 log(17)) 



And finally for customer 3: 

N log(iV) ~Yfi lo M) 



40 log(40) - (2 log(2) + 2 log(2) + 2 log(2) + 34 log(34)) 


Thus, the customer that has the greatest diversity is the first one. We also see an interesting 
property of the Shannon formula with customer 2 and 3 and this is that the quantity does not 
affect diversity (since the only difference between the two customers is that the quantity is 
multiplied by a factor of 2 and not diversity) ! 

7.6 Distribution Functions (probabilities laws) 

When we observe probabilistic phenomena, and we take note of the values taken by them and 
that we report them graphically, we can observe that the individual measurements follow a 
typical characteristic which is sometimes adjustable theoretically with a good level of quality. 

In the field of probabilities and statistics, we call these characteristics "distribution functions" 
because they indicate the frequency with which the random variable appears for given values. 



We sometimes simply use the term "function" or "law" to describe these characteristics. 


These functions are in practice bounded by what we name the "range of the distribution" which 
is the difference between the maximum value (on the right) and the minimum value (on the left) 
of the observed values: 

R = max — min (7.213) 

In theory they are not necessarily bounded and then we talk (see section Functional Analysis) 
about a "domain of definition" or more simply about the "support" of the function. 

If the observed values are distributed in a certain way then there is a probability (or "cumulative 
probability" in the case of continuous distribution functions) to have a certain value of the 
distribution function. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

In industrial practice (see section Industrial Engineering), the range of statistical values is im- 
portant (as well as the standard deviation) because it gives an indication of the variation of a 
process (variability). 

If L denote any possible univariate distribution function the range of the function is simply 
denoted by L if its domain of definition is M otherwise if it is bounded you will typically see 
something like L] 0) &]. 

Definitions (#95): 

Dl. The mathematical relation that gives the probability of a given value of the distribution 
function a random variable is named the "density function" (or "probability density func- 
tion"), "mass function" or "marginal function". 

D2. The mathematical relation that gives the cumulative probability that a random variable to 
be lower than or equal to a certain value of the distribution function is referred to as the 
"repartition function" or "cumulative function" or "cumulative distribution function". 

D3. Random variables are "independent and identically distributed (i.i.d.)" if they all follow 
the same distribution function, with the same parameters values and that they are inde- 

Such functions are very numerous, we offer then here to the reader a detailed study of the most 
known only. 

Before going any further it could be useful to know that if X is a continuous or discrete random 
variable, then are several tradition of notation in the literature to indicate that it follows a given 
probability distribution L. Here are the most common: 


X = L 
X ^ L 


X = L 

In this section and throughout the book in general, we will use the last notation! 

Here is the list of the distribution functions that we will see here as well as distribution functions 
commonly used in the industry and located in other chapters/section and those whose proof has 
yet still to be written: 

• Discrete Uniform Distribution U (a, b ) (see below) 

• Bernoulli Distribution B(l,p) (see below) 

• Geometric Distribution Q(N ) (see below) 

• Binomial Distribution B(N, k ) (see below) 

• Binomial Negative Distribution NB(N, k, p) (see below) 

• Hypergeometric Distribution H(n, p, m, k ) (see below) 

• Multinomial Distribution (see below) 

• Poisson Distribution Viji, k) (see below) 

• Gauss-Laplace/Normal Distribution A f(p, a) (see below) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

• Log-Normal Distribution £A/”(/i, cr) (see below) 

• Continuous Uniform Distribution (see below) 

• Triangular Distribution (see below) 

• Pareto Distribution (see below) 

• Exponential Distribution (see below) 

• Weibull Distribution (see section Industrial Engineering) 

• Generalized Exponential Distribution (see section Theoretical Computing) 

• Erlang/Erlang-B/Erlang-C Distributions (see section Quantitative Management) 

• Cauchy Distribution (see below) 

• Beta Distribution (below and section Quantitative Management) 

• Gamma Distribution (see below) 

• Chi-2 Distribution (see below) 

• Student Distribution (see below) 

• Fisher-Snedecor Distribution (see below) 

• Benford Distribution (see below) 

• Logistic Distribution (see section Theoretical Computing) 

• Square Gauss distribution (still must be written) 

• Extreme value distribution (still must be written) 


The reader will find the mathematical developments of the Weibull distribution function 
in the section on Industrial Engineering (Engineering chapter), and the logistic distribu- 
tion function in the section of Theoretical Computing. 

7.6.1 Discrete Uniform Distribution 

If we accept that it is possible to associate a probability to an event, we can conceive of situations 
where we can assume a priori that all elementary events are equally likely (that is to say, they 
have the same probability to occur). We then use the ratio between the number of favorable 
cases and the number of possible cases to calculate the probability of all events in the Universe 
of events U. More generally, if U is a finite set of equally likely events and A is part of U , then 
we have using set theory notation (see section Set Theory): 

Card(A) #A 
1 J Card (U) #U 


More commonly, if e is an event that may have N equally likely possible outcomes. Then the 
probability of observing the outcome of this given event follows a "discrete uniform function" 
(or "discrete uniform law") given by the relation: 

Whose mean (or average) is given by: 

E(X) = Y Pi x i = Y p e x i = Pe Y x i = X * 



info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

If we put ourselves in the particular case where X{ — 1 with i 
Sequences And Series): 


1 . . 1 1 N{N+ 1) 

Nr-! N 2 
2 2=1 

1...N. We then have (see 

N + 1 


If the random variable e take all values between [a, b] (another special case) such the distribution 
will be now denoted by U(a, b) then it should be obvious that we have for the expected mean: 

b b -ib -i / b a— 1 N 

e(a-) = y.px = p.J2i = (E'-E* 

i=a i=a u u ' 1 i=a U u ^ 1 \i= 1 i = 1 / 


b(b + 1) (a — l)((a — 1) + 1) 

b — a + 1 V 2 ' 2 

1 (6 — a + 1)(6 + a) a + b 

b — a + 1 2 2 

1 6(6 + 1) — a(a — 1) 

6 — o -|- 1 


For the variance we have (always using the results of the section on Sequences and Series): 


V(A) = y>(i - a ) 2 = £ft(i - A ) 2 = £ T;(i 

2=1 2=1 2=1 iV 

TV IV N \ 1 / N 

iV 2=1 


-| / IV iV IV \ 1 

'E* 2 - 2 ^E* + E^ 2 ) = ^ (E^-^E^^V 




\ 2=1 







1A. 2 2 " . 2 1 iV(iV + i)(2iV + i) 2AT + lJ^. 

AT f-( ^ iV q n 2 E- 





(AT + 1)(2AT+1) AT + 1^. + 

(AT+1) S 




(A/ - + 1)(2AT + 1) N + 1 N(N + 1) (N + l) 2 



N + 1 

(N + 1)(2N + 1) (N + l) 2 (N + l) 2 (N + 1)(2N + 1) (N + 1) 5 

6 2 4 6 4 

2(N + 1)(2N + 1) - 3(N + l) 2 2(2N 2 + N + 2N + 1) - 3N 2 - 6N - 3 


AN 2 + 6N + 2- 3N 2 -6N -3 N 2 — 1 





If the random variable e take all values between [a, b] (another special case) such the distribution 
will be now denoted by U(a, b) then it should be obvious that we have for the variance: 


N 2 - 1 

{b - a + l) 2 + 1 


By symmetry of the distribution if all values of the domain of definition [a, b] are taken by the 
random variable we have for the median: 

M e = E(X) 

a + b 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Here is an plot example of the mass distribution function and cumulative distribution function 
respectively for discrete uniform law of parameters {1, 5, 8, 11, 12} (we see that each value is 
equally likely): 

Figure 4.76 - Uniform law U (density and cumulative distribution function) 

As we can see in the above diagram the cumulative distribution function can be written: 

#(/ :xi< x) 

F{x) = P(X <x) = 





For sure the discrete uniform distribution has no specific modal value M 0 ! 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

7.6.2 Bernoulli Distribution 

If we are dealing with a binary observation then the probability of an event is constant from 
one observation to the other if there is no memory effect (in other words: a sum of Bernoulli 
variables, two by two independent). 

We name this kind of observations where the randoms variables takes the values 0 (false) or 1 
(true), with probability q = 1 — p respectively p, "Bernoulli trials" with "contrary events with 
contrary probabilities". 

Thus, a random variable X follows a "Bernoulli function" B(l,p) (or "Bernoulli law") if it can 
take only the values 0 or 1 , associated with probabilities p and q and so that q + p = 1 and: 

P(X = 0 ) = q P(X = 1) = p = 1 -q 


The classic example of such a process is the game of piece face or sampling with replacement 
or be considered as such (this last case is very important in industrial practice). There certainly 
is no need for the reader to formally verify that the cumulative probability is unitary... 


The introduction above is perhaps not relevant for business, but we will see in the section 
of Quantitative Techniques that the Bernoulli function naturally appears at the beginning 
of our study of queuing theory. 

Note that, by extension, if we consider N events where we get in a particular order k times one 
possible outcomes (success) and the other N — k (fail) times, then the probability of such a 
series (k successes and N — k failures ordered in any particular way) is given by : 

P(N, k ) = p k {l - p) N ~ k = p k q N ~ k (7.225) 

with N e N* according to what we got during the study of combinatorics in the section of 
Probabilities ! 

Here is an example plot of the cumulative distribution function for q = 0.3: 


08 ■ 

0.6 • 
y ’ 

0 . 4 - 

0.2 ■ 

0 0.2 0.4 0.6 0 8 1 1.2 1.4 1.6 1.8 


Figure 4.77 - Bernoulli law B (cumulative distribution function) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

The Bernoulli function has therefore for expected mean (average) choosing p as the probability 
of the event of interest: 

A t = E(X) = Y^Pi x i = p ■ 1 + (1 ~ p) ■ 0 = P (7.226) 


and for variance (we use the Huygens theorem proved above): 

V(X) = a 2 = E(X 2 ) - E(X) 2 = p - p 2 = p(l - p) = pq (7.227) 

The modal value M 0 of the Bernoulli law depends on the values of p or q. So we have (it could 
be obvious for the reader): 

M 0 = 0 p < q 

M 0 = {0,1} ^p = q (7.228) 

M 0 — 1 q > 0 



For sure the Bernoulli distribution has no specific median value M e \ 


7 . 6.3 Geometric Distribution 

The geometric law Q(N) or "Pascal’s law" consist in a Bernoulli trial, where the probability of 
success is p and that of failure q = 1 — p are constant, that we renew independently until the 
first success. 

Remember that during our presentation of the Bernoulli law we have deduce an extension to N 
such that: 

P(N, k) = p k (l - p) N ~ k = p k q N ~ k 

Therefore the probability to get the first success k — 1 after N trials is: 

G(N) = p( 1 - p) N ~ l = pq N ~ l 

\ 1 

with NeW. 



As you can see, greater is N, smaller is the probability G(N). This can be seem non-logic but 
in fact it is! Indeed in the sentence " the probability to get the first success after N trials ", you 
must not forget that it is written after and not during. 

Therefore for sure... the probability to have — 1 failures followed by 1 success will be always 
be smaller when N increase (have a look the figure a little bet further below for p = 0.5 can 
help to understand). 

This law has for expected mean: 

+oo +oo +00 

p = e (x) = j2piXi= E ^" ljV= J2p( 1 -p) n ~ 1n = pJ2( 1 ~p) n ~ 1n ( 7 - 231 ) 

i N = 1 N = 1 N= 1 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

However, the last relation can also be written: 

+OQ 1 

E = n , 2 (7.232) 

(! - 7) 2 

Indeed, we proved in the section of Sequences and Series during our study of geometric series 

E? fe 

k = 0 

1 - q n+1 

i -q 

Taking the limit n — >• +oo when we get: 


+oo 1 

E? fc = i (7-234) 

IS 1 - 7 

because 0 < q < 1. Then we just derivate both members of equality with respect to q and we 

+OC -I 


k = 1 

This done let us continue. 

We have then the average number of trials A" it takes to get the first success (or in other words, 
the expected rank - number of expected trials - to see the first success): 





E(X) = £ NP(X = im= £ Npc"' 1 = 

N=0 N=0 A 7/ P P 


Now we calculate the variance and reminding once again (Huygens theorem): 

V(X) = E(X 2 ) - E(X) 2 (7.237) 

So let’s start by calculating E(A^ 2 ): 




E(A 2 ) = E N 2 P(X = N)=pJ2 N 2 q N ~ l = p E N ( N - 1 + l)q 

N=0 N=0 N=0 

+oo +oo 

= pE N (N - 1 )q N ~ 1 + p E N Q N ~ 1 

N=l N=0 



The last term of this expression is equivalent to the expected mean calculated previously. Thus: 

+oo 1 

p y Nq N ~ x = - (7.239) 

0 P 

It remains to calculate: 


p y N (N-l)q 

N - 1 


N = 1 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

We have: 

+oo +oo 

p Y N(N - 1 )q N ~ 1 =pqY N ( N - l h N ~ 2 (7.241) 

JV=1 N = 2 

But deriving the following equality: 

+oo 1 

= ^ (7.242) 

We get: 

+ QO 2 

£*(*-!),*-’= (7.243) 



pq Y N(N — 1 )^ 2 = pq 


(1 -qy 

= pq -3 = 


2 q 




E(X 2 ) = ^ +A (7-245) 

Finally when it comes to ranking the expected variance of the first success (i.e.: the variance 
expected number before the first successful trials): 

V(X) = a 2 = E(X 2 ) - E(X ) 2 = ?§ + --! = (7.246) 

p2 p p2 p2 

The modal value is easy to get because we need to find the value of N that maximize the 
definition of the geometric law: 

pq N ~ 1 (7.247) 

and we hope that it is immediate to the reader that this is satisfy when N = 1 therefore: 

M 0 = pq l ~ l = pq° = p (7.248) 

Now let us determine the median M e to finish. For this, by definition we know we must have: 

M e 

N= 1 




Yw N 1 = E pq n 1 = °- 5 


But we can rewrite: 

M e +OO +OO -1 -1 

y pq 7V - 1 = Y pq iV - 1 = pq M *~ l Y q N = P q Me l — - — = pq M °~ X - = q M ^ = 0.5 

fYl N= Me ^0 1 - 7 P 


Therefore (in base 10): 

log (g Me_1 ) = (M e - 1) log(g) = log(0.5) (7.251) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Finally base on our definition of the median we get: 


M e — — - - f 1 

log {q) 


Now we determine the cumulative function of the geometrical law. We start from: 

G(N) = pq N ~ 1 (7.253) 

Then we have by definition the cumulative probability of that the experience is successful in the 
first N trials: 

+oo +oo 

P( X < N) — 1 — J2 M j ~ l = 1 ~P Y 

j=N+l j=N + 1 

with N being for sure an integer of values 0, 1, 2, .... We write: 

j — l = n + k=>k = n — j + 1 

We then have for the CDF: 

+oo +oo +oo 




p(x <n) = i- p J 2 ( i N+k = i - p Y v N< i k = 1 - pq n Y v k = 1 - pq n Y v k 

k= 0 k= 0 fe=0 k= 0 (J 256) 

= 1 — pq 


i -q 

= i - (i - q)q 


i -q 

= i -q 



You try late at night and in the dark, to open a lock with a bunch of five keys, without 
attention, because you are a little tired (or a little tipsy ...) you will try each key. 
Knowing that only one key will work, what is the probability of using the right key at 
the N - th test? 

The solution is: 



Plot of the mass function and cumulative distribution function for the Geometric distribution 
with parameter p = 0.5: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Figure 4.78 - Geometric law Q (mass and cumulative distribution function) 

7.6.4 Binomial Distribution 

We come back now to our Bernoulli experiment. More generally, any particular AMuple con- 
sisting of k successes and of N — k failures will have for probability (within a sampling with 
replacement or without replacement if the population is large ... in a first approximation): 

P(N, k) = p k ( 1 - p) N ~ k = p k q N ~ k (7.258) 

to be drawn (or appear) whatever the order of appearance of successes and failures (the reader 
will have perhaps notice that this is a generalization of the geometric distribution, just write 
k = 1 to find the geometric distribution back). 

But we know that the combinatorial determines the number of iV-tuples of this type (the number 
of ways to order the appearance of failures and successes). The number of possible arrange- 
ments is, as we proved it (see section Probabilities), given by the binomial coefficient (we recall 
that the notation in this book does not comply with ISO standard 31-11): 

C u 


k!(N — k)! 


So as the probability of obtaining a given series of k successes and N — k failures is always the 
same (regardless of the order) then we have just to multiply the probability of a particular series 
by the binomial coefficient (this is equivalent to a sum ) such that: 

p w *> = c ? p = 1 ' P)N "* = <7 ' 260) 

to get the total probability to obtain any of these possible series (since each is possible). 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


This is equivalent to the study of a sampling with simple replacement (see Probabilities) 
with constraint on the order or to the study of a series of successes and failures. We will 
use this relation in the context of the queuing theory or reliability (see section Industrial 
Engineering). Note that in the case of large populations, even if the sampling is not with 
replacement it can be considered as with... 

\ y 

Written in another way this gives the "binomial function" (or "binomial law"), also known as 
the following distribution function: 

B(N, k ) = Cf/(1 - p) N ~ k = C*p k q N ~ k = (Tj p k q N ~ k 


and sometimes also denoted by ft(n,p) with a lowercase n or uppercase N (it does not really 
matter...) and can be calculated in the English version of Microsoft Excel 11.8346 using the 
BINOMDIST ( ) function. 

We sometimes say that the binomial law is not exhaustive as the size of the initial population is 
not apparent in the expression of the law. 


The Binomial distribution is named "Symmetric Binomial Distribution" when p = 0.5. 


We want to test the alternator of a generator. The probability of failure at solicitation of 
this material is estimated to be 1 failure per 1, 000 starts. 

We decided to test 100 starts. The probability of observing one failure in this test is: 




B(N = 100, k = 1) = C fc V(l -P) = 

k\{N — k)\ 

100 ! 

1 ! (100 — 1 )! V 1000 



2* 9% 

We obviously have for the cumulative distribution function (very useful in practice for suppliers 
batch control or reliability as we will see in the section of Industrial Engineering!): 


Y J C%p k (l-p) N - k = l (7.263) 

k = 0 

Indeed, we have proved in the section of Calculus the "binomial theorem": 


(x + y) n = •£ C n k x k y"- k = 1 (7.264) 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 



E C A‘(1 - pX* = (P + (1 - Plf = 1" = 1 (7.265) 

k = 0 

Instead of calculating such cumulated probability rather than hand it is better to use Microsoft 
Excel 11.8346 (or any other widely known software) with the function CRITBIN0MO to not 
bother to calculate these type of values. 

The expected mean (average) of £>(X, k) is given by: 

N N N N 

Epf) = £p.P=£ P(X = k)k = E <7V 0 - pf-“k = E C t V(l - p) N ~ k k 

k= 0 k= 0 k= 0 k= 1 


But having: 

Cf = = (VA-1 1 


We finally get: 

tv JV 

E(A') = E C»V(1 -p) N ~ k = N E (1 - P) 





= Xp £ C&y-^l - p)^- 1 )-^- 1 ) = Xp £ Ck “V(l - p) (JV_1)_fe 






= Xp(p + (1 — p)) = Xp 

that gives the average number of times that we will get the desired outcome of probability p 
after A r trials. 

The mean of the binomial distribution is sometimes noted in the specialized literature with the 
following notation if r is the potential number of possible expected outcomes in a population of 
size n: 

E(X) = Xp = X- 

Before calculating the variance, we need to introduce the following equality: 

(TV- 1)! 

5 7!(JV - 1 - a) 

Indeed, let us proof this relation using the previous developments: 

N ~ 1 (N — 1)' Ar_1 

-p-q- - “ = (X- l)p 




^ s!(X — 1 — s)! 




=5 s(s-l)KV-l-s)! 

p , q N ~ 1 - 


= E 

s „N— 1— s 

El (s — 1)!(X — 1 — s)! 

p q 

= (N - l) E 

(X — 2)! 

,s „A7— 1— s 

v- ,t) (« - 1)!(V - 1 - »)! 

W-2 (N — 2)' Ar_2 

= (x - 1) E = (n - i ) p £ £ W 2 ' 

i=o z 7)- j=o 


= (X - 1)P £ cf 

p q 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

We recognize in the last equality the cumulative distribution function that is equal to 1. There- 

TV- 2 

(N - 1 )p J2 Cf- 2 p>q N - 2 - j = (N — l)p ■ 1 = (N — 1 )p (7.272) 

j = o 

We start now the (long) calculation of the variance of the binomial distribution by using the 
previous results: 




N\ N 

v(k) = E K (k - = E j ; - N V f- = g mN _ k)V 

p q (k - nY 

N 2 N\ 




" 2 M 



pV ‘ + E *!(JV-fe)! 
AM t iV! 

p k q N ~ k 



]^ 0 2kfJ, k\(N -k)\ 


p k q N ~ k 

= V k 2 

“fro k\(N — k)\ 

= y ^2 

“fro *K*-*)! 


+ n 2 y 

Pq +P t 0 k\(N-k)\ 
N\ N 

p k q N ~ k 



2 ^ 0 k k\(N-k)\ 

p k q N ~ k 


P \ N - k + ^W- k -2^k m _^ 

p k q N ~ k 

=1 =/i=Np 

W-> + ?- 2?.jte w gL w 

pk q N ~k _ y 

r ,-P + N T.( s + 1) ,}* , 1)! 

“ fe_1 s=0 

s\(N — 1 — s)! 


= -p 2 + Np E l> + 1) _„[* , 1) E, pV w 

= -p 2 + Np E 

S !(JV - 1 - s)! 

N ~\ .,{ N ~ 1)] .. p’qW-V- 3 + NpJ2 

TV- 1 

=o s\(N-i- s y/ 

TV- 1 

a A tl 1 . (iv- 1)-.) 

=S »!((JV - 1 - S )! P q 

TV— 1 

= -p 2 + NpY, sC'f "Vg (JV_1)_s + NpY, -sCf-y g 

-i s (TV-iJ-s) 




— /i 2 + Np(N — l)p + iVp ^ sC y 1 p s q {N ^ ^ 

s = 0 

= -p 2 + Np 2 (N - 1) + Np • 1 = -p 2 + Np 2 (N — 1) + Np 
= —p 2 + N 2 p 2 — Np 2 + Np = —p 2 + p 2 — Np 2 + Np = Np{ 1 — p) 
= Apg 


V(fc) = cr 2 = iVp(l — p) = Apg 



The standard deviation of the binomial distribution is sometimes noted in the specialized liter- 
ature in the following way if r is the potential number of expected outcomes in a population of 
size n and s the not expected one: 

a = 

J Npq = Jn—— = — VNrs 
V nn n 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Here is a plot example of the binomial 13(10, 0.5) distribution and cumulative distribution func- 

Figure 4.79 - Binomial law B (mass and cumulative distribution function) 

It could be useful to note that some employees in companies normalize the calculation of the 
mean and standard deviation to the unit of N. Then we have: 



= P 


= iv5 V(A ’> = 

= 4p(l 

p) CT = 

P(1 ~P) 



In a sample of 100 workers, 25% are late at least once a week. The mean number and 
variance of late people is then: 

E(fc) = Np = 100 • 0.25 = 25 
a k = sj Np(l - p) = y/ 100 ■ 0.25(1 — 0.25) ^ 4.33 

Normalized to the unit of N this give us: 

E( A ) =P=0.25 

&k/N — 



o 7.277 ) 


Let us now calculate the mode. Because the function is discrete we can not use derivative. Then 
we will use a hint. We compute the ratio: 

B(N, k ) 
B{N, k + 1) 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

and we check that this ratio is > 1 for every k < k* and < 1 for every k > k*, for some integer 
k* that is the k value corresponding to the modal value. 

Let a k = P{ X = k). We have: 


a k = C£p k q 

_ (■ ~iN k+1 N—k — 1 

a k+ 1 — _iP q 


We calculate the ratio 


Note that: 

Ok + 1 = Cj? +1 p k+1 q N k 1 
a k C^p k q N ~ k 


(k + l)!(iV — (k + 1))! p k +^ q N-k-i 

m pk q N—k 

k!(N — k)! 

n — kp n — k p 

k + 1 q k + 1 1 — p 


What is important now is to analyze: 

np — kp 

n — k p 
k + 11 — p k — kp + 1 — p 


depending on the value of k. First we can see that this ratio is equal to 1 and therefore we have 
to modes if: 

np — kp 

k — kp + 1 — p 

= 1 np — kp = k — kp + 1 — p 


That is to say if k = np + p — 1 = p{n + 1) — 1. This can be seen as the limit point of interest. 
But don’t forget we are looking for the k such that the ratio is less than 1. So we try two values: 

k = [p{n + 1) — 1] + 1 
k = [p(n + 1) — 1] — 1 

Injecting this in our ratio we see that 

k = [p(n + 1) — 1] + 1 = k* = M 0 



Is the value we were looking for. Finally there are two possible values for the modes. A unique 
modal value and a double modal value. 

As we know the median value, is the value of X such that we have: 


J2 Cj?p k (l - p) N ~ k = 0.5 (7.286) 

k = 0 

But we did not yet found an easy proof to determine M e in the general case for the Binomial 

To conclude on the binomial law, we will develop now a result that we will need to build the 
McNemar paired test for a square contingency table (and as it is squared it is also dichotomous) 
that we willl study in the section of Theoretical Computing. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

We need for this test to calculate the covariance of two-paired binomial random variables (this 
is why the covariance is non- zero): 

cov(rii,7ij) = E (niTij) — ’E{n i ){nj) (7.287) 

As they are paired, this means that: 

Tii + rij = n 
Pi = 1 - Pj 
1 ~Pi= Pj 


And therefore: 

co v(ni,nj) = E(nirij) - E (ra*)(7ij) = E(?r i n i ) - (■ npi)(npj ) = E^npij) - n 2 pj{ 1 - pj) 


Now comes the difficulty that is to calculate E (npij). To calculate this term it does not exist to 
our knowledge other methods than looking for the law of the pair (sometimes we can get around 
such approach). In this case it is a multinomial distribution (more precisely: trinomial) that it is 
customary to write in the following way by construction: 

M{n,k,l,pi,pj,l~pi - pj ) = 


pTp/ i 1 ~ Pi ~ Pj) 


rii\nj\{n — rii — nj)\ 1 3 

that we will write now temporarily as following to condense the expression: 


M(n,kJ,p, q ,r) = m — I - J¥ p q r 

k A^n—k—l 



So we have a trinomial law as we are looking for the number of times we have the event k, the 
event l and neither one nor the other (so the rest of the time). 


We then get: 

= o + o + o + Y, H 

k=0,l^0 1=0, k^O k=0,l=0 *>1.1>1 

If k > 1 and l > 1, we obtain: 

, , n\ n\ 


p k q l r n-k-l 


p k q l r n-k-l 


k\l\(n — k — l) 

n{n — 1 ){n — 2)! 

k\l\{n — k — l)\ k(k -!)!((( -l)!(n- k - l)\ (k - !)!(/ - l)!(n - k - l)\ 

= n(n — 1)- 

(n — 2)! 


(k — l)\(l — l)!(n — k — l)\ 

Now we use this relation in the joint mean: 

E(JfcZ) = V kl pr.p k q l r n ~ k ~ l 

fc >tT>i k\l\(n - k - l)\ P q 

— n (n — 1) V kl ~ 2 ^ ! 

- ^ k(k-mi-mn-k-i)\ 

pk q l r n-k-l 


= n{n — 1 )pq ^ 

(n — 2)! 


[k — !)!(/ — l)!(n — k — l)\ 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Consider now the special case where n is equal 2. We then have: 

(n - 2)! 


p k ~ 1 q l ~ 1 r n ~ k ~ l 

!<*<„,! <l<n (k-l)l(l-mn-k-iy/ 

y p k-i j-i r 2-k~i 

i<fc<it<K2 ( k - !)!(/ - 1)!(2 — k — l)V 


0 ! 

( 1 - 1 )!( 1 - 1 )!( 2 - 1 - 1 )! 

p i-i q i-i r 2-i-i = x 

where the sum is reduced to only one term because if we take for example k — 2, l = 1 we get 
a negative factorial at the denominator. 

For n equal 3, the result will be also 1, and so on (we will assume to simplify... that some 
numerical examples will suffice to convince the reader of the generality of this property because 
it is very boring to write with ETpX). 

Then we have: 

(ti — 2 V 

E(«) = n(n - 1)„ £ FWAW 

pk-1 ql-1 r n -k- 1 

= 77,(n — 1 )pq 


So in the end: 

COv(77i, 71 j) = E(77j? Ij) ~ 77? Pj ( 1 - Pj ) = 77(t7 - 1 )pipj - 7l 2 piPj = -TTiPiPj (7.297) 
And this is the major result we will need for the study of the McNemar test. 

7.6.5 Negative Binomial Distribution 

The negative binomial distribution is applied in the same situation as for binomial distribution, 
but it gives the probability to have E failures before the Ti'-th success when the probability of 
success is p (or, at contrary, the probability to have R success before E - th failure when the 
failure probability is p). 

We will introduce this important distribution with an example. Consider this for this purpose 
the following probabilities: 

P(success) = 0.2 = p P(failure) = 0.8 = 1 — p = q (7.298) 

Imagine that we have done 10 trials and we wanted to stop at the third success and that the 10th 
trial is the third successful one! We will write this: 

[1 2 3 4 5 6 7 8 9] 10 (7.299) 

Now we highlight what we will consider as the successes (R) and failures (E): 

[1 2 3 4 5 6 7 8 9] 10 


[eereeeree] r 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

We have also 7 failures and 3 successes. In an experiment where the draws are independent (or 
can be considered as independent...), the probability that we get this particular result is: 

(0.8) 7 (0.2) 3 (7.301) 

But the order of successes and failures in the bracketed part is irrelevant. So as we have 2 
success among the 9 trial in brackets it follows that the probability of obtaining the same result 
regardless of the order is then using combinatorics: 

<7f (0.8) 7 (0.2) 3 = (0.8) 7 (0.2) 3 =* 0.0603 (7.302) 

Which corresponds to the probability of having 7 failures before the 3rd success (or otherwise 
seen: 3 successes after 10 trials). This can be written with Microsoft Excel 14.0.6123 or later 
(7 + 3 = 10 trials, 7 failures including 3 successes): 

=NEGBIN0MDIST (7 , 3 , 0 . 2 , 0) =0 . 0604 

We now generalize the prior-previous notation by writing the number of failures k, N the total 
number of trials and p the probability of success: 

NB(JV, k, p) = C%zl_ 1 q k p N ~ t = ( N ~ k _ ~ ^ q k p N ~ k (7.303) 

However, there are several possible notations because the previous relation is not very intuitive 
to practice as may have perhaps noticed the reader. Thus, if we denote k as the number of 
successes and not the number of failures, then we have (the most common writing way from 
my point of view among a lot of others notations) the following probability of having N — k 
success before having a number k of failures with a probability p (or of failures before having 
k successes ... it’s symmetrical!): 

m(N,k,p) = C++~ V = (^r i V* w -‘ (7.304) 

therefore the comparison with the formulation of the binomial distribution proved above is then 
perhaps more obvious! 

However, it is more common to write the previous relation by removing N because for the 
moment the notation is still perhaps not very clear. For this, we note R the number of successes 
, E the number of failures, p the probability of success and then comes the probability of having 
R success after E failures (this is perhaps much more clear...): 

NB( J R, E,p) = C^ iq {E+R) - E p E ={ E e R 1 1 ) pEqR (7305) 

We sometimes find this last relation with another relation using explicitly the binomial coeffi- 

NB (R,E,p) 

E + R-l 
E — 1 

E + R-l 

q R p E 

q R p E 

(■ E + R !) ! R E 

(E-1)\R\ q 1 

(E + R 1)! r e 
R\(E- 1)! q 1 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

The cumulative probability that we have at least R successes before the E - th failure is obviously 
given by: 



R = 0 

(E + R- 
V E- 1 

q R p E 



The name of this law comes from the fact that some statisticians use a definition of the 
binomial coefficient with a negative value for the expression of the function. Since this 
is a rather a rare notation, we do not want lose time to prove the origin of the name. 
You should also know that this law is also known as the "Pascal’s law" (as well as the 
geometric distribution ...) in honor of Blaise Pascal and also as "Polya’s law" in honor of 
George Polya. 

V / 


El. A long-term quality control has enabled us to compute the estimator of the 
proportion p of nonconforming pieces as equal to 2% at the output of a production line. 
We would like to know the cumulative probability to have 200 pieces before the 3rd 
defective piece appears. With Microsoft Excel 14.0.6123 or later it comes using the 
negative binomial distribution: 

=NEGBIN0M . DIST (200 , 3 , 0 . 02 , 1) =77 . 35% 

E2. To compare with the binomial distribution, we can ask ourselves what is the cumu- 
lative probability of drawing 198 non-defective parts from 201 using Microsoft Excel 
14.0.6123 or later: 

=BIN0M . DIST (198 ,201,0.98,1) =76 . 77% 

Therefore we see that the difference is small. In fact the difference between the two laws 
is in practice so small that we then use almost always the binomial law (but you should 
still be careful with this choice!). 

As usual, we will now determine the variance and mean of this law. Let’s start with the mean 
of having R successes when the 77-th failure appears knowing that the probability of a failure is 
p. For this we will use a very simple and ingenious trick (all art was thinking about it...). If we 
return to our initial example: 

[1 2 3 4 5 6 7 8 9] 10 

[rrerrrerr] e 
and we rewrite this example as follows: 

[1 2 3 4 5 6 7 8 9 10] 

[R R E R R R E R R E] 

=x x =x 2 =x 3 




info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

We then notice that the third success R of the first notation can be decomposed into the sum of 
three geometric random variables such that: 

R — X± + X 2 + ... + X n (7.310) 

With in the case of this particular example n — 3 corresponding in fact to E — 3. So quite 
generally the sum of n random geometric variables always gives a negative binomial distribution 
if the probability p is equal for each geometric variable!!! Anyway... as we have proved the 
expression of the mean and variance of the Geometric law as (thus giving us the mean rank of 
the first failure): 

E(X) = - V(X) = ^ = 4 (7.311) 

p p z p z 

since the random variables are independent and of same parameters then it comes for the nega- 
tive binomial the mean of the rank of 77-th failure using the property of the mean: 

E(X) = E(Xl + X 2 + ... + X n ) = E(Xt) + E(X 2 ) + ...E(X n ) = nB(X) = n- = E- 

P P 


And therefore for the variance of the negative binomial distribution: 

V(X) = W(X 1 + X 2 + ... + X n ) = Y(X 1 ) + V(X 2 ) + ...V(X n ) = nV(X) = n — ^ = e\ 

p z p z 


So the mean and variance of the rank (corresponding to the number of trials N or from another 
point of view: the mean number of successes by the simple subtraction X — E ) to have the 77-th 
failure is then to summarize: 

E(X) = - 

P _ (7.314) 

V(X) = E 1 -^- 

p z 

Thus, putting 77 = 1, we fall back on the mean and variance of the geometric distribution! 

Now, let Y be the random variable representing the number of trials before the 77-th success. 
We then have the following expressions for the variance and the mean that are very common in 
the literature (these expressions of mean and variance corresponds to what we can find for the 
negative binomial law in Wikipedia for example): 

E(y) = E(X - 1) = EOf) - E(l) = E(I) - 1 = — - 1 = 


W{Y) = W{X - 1) = W{X) + V(l) = Y{X) = E ^~ p) 

77 — p 77(1 — p) 




info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


What is the expected number (mean) of trials we can expect before we fall on the 
third non-conforming part, knowing that the probability of a non-conforming part is 2%? 

E(A') = - = A = 150 (7.316) 

p 2% 

and for the standard deviation: 

a= ^=fW 1=s5 - 732 <73i7) 

Like always the reader will find below a plot example of the distribution and cumulative distri- 
bution function for the negative binomial law of parameters NB(iV, k. p) = P(N, 3, 0.6) based 
on the example of the begging, but where the only difference is the probability of success where 
we he have taken 60% instead of 20%. 

Thus, there is 21.6% of probability of having the third success after the third successive trial (i.e. 
0 trials more than the number of successes), 25.92% of probability of having the third success 
after the fourth successive trial (i.e. one trial more than the number of successes), 20.7% of 
probability of having the third success after the fifth successive trial (i.e. two trial more than the 
number of successes) and so on...: 

Figure 4.80 - Negative Binomial law NB (mass and cumulative distribution function) 

The above distributions are truncated to 9 (corresponding to 12 trials) but they theoretically 
continue indefinitely. What particularly distinguishes the binomial and geometric distributions 
from the negative binomial are the tails of the distribution. 

The binomial negative distribution has an important place in a special regression technique that 
we will see in the section of Theoretical Computing. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

7.6.6 Hypergeometric Distribution 

We consider to approach this function a simple example (but not very interesting in practice) that 
is this of an urn containing n balls where m are black and the other m white (for several impor- 
tant examples used in the industry refer to the sections of Industrial Engineering or Numerical 
Methods). We take successively, and without replacement in the um, p balls. The question is to 
find the probability that among the p balls, there is k that are black (in this statement the order 
of the drawing does not interest us). 

We often talk about "exhaustive sampling" with the hypergeometric distribution because at the 
opposite of to the binomial distribution, the size of the lot which is the basis for the sampling 
will appear in the law. 


This is also equivalent to an non-ordered sampling without replacement (see section 
Probability) with constraint on the occurrences sometimes called "simultaneous sam- 
pling". We will often use the hypergeometric distribution in the field of quality and 
reliability where the black balls are associated to items with defects and the white one to 
items without defects. 

The p balls can be chosen among n balls in C™ ways (thus representing the number of possible 
different outcomes) with as reminder (see section Probability): 


% “ 

K n! 
p\ p\(n — p)\ 


The k black balls can be chosen among the m black in O'" 1 ways. The p ~ k white balls can be 
chosen in C^Z™ ways. There is therefore ways to have k black balls and p — k white 


The searched probability is therefore given by (we will see an alternative notation in the section 
of Industrial Engineering): 


(n — m)\ 

H(n,p, m, k ) = 

s~im rrn—m 
^ p—k 

0 2 

k\{m — k)\ (p — k)\((n — m) — (p — k))\ 

p\ (n — p ) ! 


and is said to follow a "Hypergeometric distribution" (or "Hypergeometric law") and can be 
obtained fortunately directly in Microsoft Excel 14.0.7 153 with the function HYPGEOM . DIST ( ) . 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


El. We want to develop a small computer program of 10,000 lines of code (n). The 
return on experience shows that the probability of failure is one bug per 1, 000 lines of 
code (or 0.1% of 10, 000 lines) that corresponds to the value m. 

We test about 50% of the functionality of the software randomly before sending it to the 
customer (corresponding to the equivalent of 5, 000 lines that is p). The probability of 
observing 5 bugs (k) is then given with Microsoft Excel 14.0.715: 

HYPGE0MDIST (k , p ,m , n) =HYPGE0MDIST (5 , 5000 , 1'/.* 10000 , 10000) =24 . 62% 

E2. In a small single production of a batch of 1, 000 pieces we know that 30% on average 
are bad because of the complexity of the pieces and by return on experience from a 
previous similiar manufacturing. We know that a customer will randomly draw 20 pieces 
to decide whether to accept or reject the lot. He will not reject the lot if he finds zero 
defective pieces on the 20. What is the probability of having exactly 0 defective? 

=HYPGE0MDIST (0 , 20 , 300 , 1000) =0 . 073% 

and as we require a null draw drawing result, the calculation of the hypergeometric dis- 
tribution simplifies manually to: 

700 699 698 697 681 
1000 999 998 997‘“ 981 



It is not forbidden to make direct calculation of the mean and variance of the hypergeomet- 
ric distribution, but the reader will without much trouble guess that this calculation will be ... 
relatively indigestible. Then we can use an indirect method that is much more interesting! 

First, the reader will perhaps, even certainly, have noticed that experienced of the hypergeomet- 
ric distribution is a series of Bernoulli trials (without replacement of course!). 

So we will cheat by using initially the property of linearity of the mean. We define for this 
purpose a new variable corresponding implicitly in fact to the experience of the hypergeometric 
distribution (a sequence of k Bernoulli trials!): 


X = Y,Xi (7.321) 

i = 1 

where X t is the success of obtaining at the ?’-th drawing a black ball (either 0 or 1). But, we 
know that for all i the random variable X, follows a Bernoulli function for which we have 
proved in our study of the Bernoulli distribution that E(2Q) = p. 

Therefore, by the property of linearity of the mean we have (caution! here p is not the number 
of balls, but the probability associated with an expected event!): 

EPO =E(X>V) (7.322) 

In the Bernoulli trial, p is the probability of obtaining the desired item or event (for reminder...). 
In the hypergeometric distribution what interests us is the probability of a black ball (which are 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

in quantity m, therefore with m' white balls) compared to the total amount of n balls. And the 
ratio obviously gives us this probability. Thus, we have: 

rji rji 

/i = E(X) = kp = k = k— (7.323) 

m + m' n 

where k is the number of trials (do not confuse with the notation of the initial statement where 
it was by the variable p\). The mean gives then the average number of black balls in a drawing 
of k balls among n, where m are known as being black. The reader will have noticed that the 
hope of the hypergeometric distribution is the same as the binomial distribution! 

To determine the variance, we use the variance of the Bernoulli distribution and the following 
relation proved during the introduction of the mean and covariance at the beginning of this 

V(X + Y) = V(X) + V(y) + 2cov(X, Y) + V(X) + V(y) + 2(E (AT) - E(X)E(Y)) 



Recalling that we have X — X t , we get: 


V(A') = V X = £ V(X 0 + 2 •£ (E(M - E(A.)E(A',)) 

\z=l / i = 1 i=l<l<j<n 

However, for the Bernoulli law, we have: 

m m' 


V(X) =pq = 

mm mm 

m + m! m + m' ( m + m') 2 n 2 


Then we first already get: 

V 51 Ail = k 


= k 


\i = 1 

(m + m’) 2 n 2 


The calculation of E(XjXj) requires a good understanding of probabilities (this will be a good 

The mean E (XiXj) is given (implicitly), as we know, by the weighted sum of the probabilities 
that two events occur at the same time. However, our events are binary: either it is a black 
ball (1) or it is a white ball (0). So all terms of the sum without two consecutive black balls 
consecutively will be null! 

The problem is then to calculate the probability of having two consecutive black balls and it is 
thus written: 

p{ x^j = i) = P((x t = i) n (Xj = i)) = p(Xi = l )P Xi= 1 p(x j = l) 

m m — 1 (7.328) 

m' + m ( m ' + m) — 1 

So we finally have: 

E(AjXj) = — 

m m — 1 

m! + m {m! + m) — 1 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


co y{Xi,Xj) = EiXiXj) -E(X,)(X i ) = 


m — 1 

m! + m (m' + m) — 1 ^m' + m 




[m' + m) 2 {m' + m — 1) 

Finally (using the result of Gauss series seen in the section of Sequences and Series): 

V(A) = £ V(A, : ) + 2 £ (E(A^Aj) - E(A*)E(A,)) 

2=1 2=l<l<j<n 

= k 

= k 

mm —mm x , 

l_2 1 

( m ' + m) 2 {m' + m) 2 (m' + m — 1) i=1 ^^ . <n 


+ 2 - 


(k - l)k 

(m' + m) 2 (m' + rn)' 2 (rn' + m — 1) 2 

kmml{ml + m — 1) — mm'k(k — 1) kmm'(m' + m— 1 — k-\-l) 


(m' + m)' 2 (m' + m — 1) 
kmm'(m' + m — k) . m 

= k 

(■ m ’ + m) 2 {m ' + m — 1) 
m' m' + m — k 

(m' + m) 2 {m' + m — 1) m' + m m! + m m' + m — 1 

m' + m — k n — k 

= kpq = kpq- 

r rr ? 1 "* 

; m' + m — 1 
where we have used the fact that: 

is composed of: 




C k fk\ _ H k - !) 



terms as correspond to the number of ways there are to choose a pair (i,j) with i < j. Because: 

n — k 

V(X) = kpq 

n — 1 

We can write: 

cr = 


n — k 




In the specialized literature, we often find the variance written in the following way by noting 
the expected event r and the non-expected event s: 

n — k r s n — k krs(n — k) klrs 
V(A) = kpq = k 

n — 1 

mnn — 1 n 2 (n — 1) n 2 (n — 1) 


so with l = 7i — k. This last notation will be very useful in the section of Theoretical Computing 
for our study of the Mantel-Haenszel test. Furthermore, we see that in: 

a = 


In — k 

n — 1 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

there is the standard deviation as the binomial distribution, at the difference of a factor that is 

fpc = J — ? (7.338) 

V n — 1 

the we found often in statistics and is named "finite population correction factor". 

Here is an example plot example of the distribution function and cumulative distribution for the 
Hypergeometric function of parameters (n,p, m, k) = (10, 6, 5, k): 

Figure 4.81 - Hypergeometric law H (mass and cumulative distribution function) 

We will prove now that the Hypergeometric distribution tends to a binomial distribution since 
this property is used many times in different sections of this book (especially the section of 
Industrial Engineering). 

To do this, we decompose: 

rvm rrn—m 

H{n,p,m,k)= ' k r p ~ k 

We then get: 

m\ (n — m)\ 

C™Cp_™ _ kUjn - k)\ (p - fc)!((n - m) - ip - k))\ 




p\(n — p)\ 



(■ n ■ 

- m)\ 

p\(n — p)\ 

k\(m - 

- k)\ (p — 

k)\((n — 

m ) — (p — k)) 

! n\ 



(n — m)\ 

(n — p)\ 

k\(p — 

k)\ (m — 

k)\ (p — 

k)\((n — m) — 

ip — k))! n! 

- a k 


(■ n 

— m)\ 

(n — p)\ 

V /' \ 

p (■ m — k)\ ( p — k)\((n — m) — (p — k))\ n\ 



info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

For the second term: 


1-2-3 ■ ■ m 

= m{m — 1 — k + 1) 

(m — k)\ 1 • 2 ■ 3 • • (m — k ) 

For m — > +oo (...) all the terms are of then of the order of m. Then we have: 

m{m — l)...(m — k + 1) = m k 



For the third term, a identical development to the previous one provides (for sure we need that 
also n — » Too (...)): 

{n — m)\ 

[p — k)\({n — mn) — (p — k))\ 

= (■ n - m) p ~ k 


And for sure... we can discuss therefore about n — m when both terms tends to infinity... Ditto 
for the fourth term: 

(n — p)\ 

= n~ p 



In conclusion we have: 

rvm rvn—m 

= a 

C n n.m — >-+oo 


k m k {ii — m) p k 
n p 


We change the notation by writing p (the number of individuals drawn) as being N. We get 


= C; 

N m k [n — m) N k 

Cft n,m-s-+oo 




We make another change of notation by writing b the black balls and w the white balls. We get 


k^N-k = c i v b k (n-b) N k „ M b k w N 



Cm n,b-H-oo k n N 

= C? 


'jV Ib,u—r-r^ \i PI 

Finally, we note p the proportion of black balls and q that of white balls in the lot. We then get: 

,jv (' np) k (nq) N ~ k 

s-qnp /~in—np 

°/c °AT-/c 

= Ci 

= C?(n P nnqy- k n-" = C£p k q 

N-k-N riNkN-k 

Cft rip- S-+00 




We find out the binomial distribution! In practice, it is common to approximate the hyperge- 
ometric distribution with a binomial distribution when the ratio of the number of individuals 
from the total number of individuals is less than 10 

In practice, Monte Carlo simulations with testing adjustments (see late in this chapter) have 
shown that the hypergeometric distribution could be approximated by a Normal distribution 
(very important case in contingency statistical tests that we will study in the section of Theoret- 
ical computing) if the following three conditions are met simultaneously: 

, n — m m , n 

k > 9 k > 9 k < — 

m n — m 10 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Thus graphically and approximately...: 

Figure 4.82 - Conditions of application of the approximation by a Normal distribution 


H(k,m,n ) = AT(p,a] 

. - , , m / krs(n — k) 
= A/| k — , 1 v 

. . - ,m kmm'(n-k) . 

= AH k-,J (7.350) 

n ’ y n 2 (n — 1) / \ n V — 1) 

7.6.7 Multinomial Distribution 

The multinomial distribution (also called because it involves several times the binomial coeffi- 
cient) is a law applicable to n distinguishable events, each with a given probability, which occur 
one or more times and it is not necessarily ordered. This is a frequent case in marketing re- 
search and that will be useful to build the statistical McNemar test that we will study much later 

(see section Theoretical Computing). We also use this law in quantitative finance (see section 

More technically, consider the space of events fl = {1, 2, .... m} with the probabilities P({i}) = 
Pi, i — 1,2, ..., n. We take n times with a given element of with replacement (see chapter 
Probability) with the probability p t , i = 1, 2, ..., n. We will search what is the probability of 
such a non-necessarily ordered the event 1, k \ times, event 2, k 2 times and this on a sequence 
of n drawings. 


This is equivalent to the study of a sampling with replacement (see section Probability) 
and constraints on the occurrences. So without constraints we will see with an example 
that we fall back on a sampling with simple replacement. 

V 1 W 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

We saw in the section of Probabilities, that if we take a set of events with multiple outcomes, 
then different combinations of sequences we can get taking p selected elements among n is 
given by: 

C " = 


We have therefore: 

p p\(n—p)\ 

C n = n! 

' kl hlin-hy. 

different ways to get k\ times a given event. Thus an associated probability of: 

p(n,h) = cj&pNr * 1 = cip k y{i - Pl ) n - ki 




Now comes the particularity of the multinomial distribution!: there are no failures in contrast to 
the binomial distribution. Each "pseudo-failure" can be considered as a subset draw of k 2 items 
from the n — k\ remaining elements. Thus the term: 

n /ci n n—ki 

C n kl vfql 


will be written on the whole experience if we consider a particular case limited to two types of 

so with: 

(n — A- 1 ) ! 

s~yn — k\ 

U k 2 ~ 

k 2 \((n - ki) - k 2 )\ 



which gives us the number of different times to get k 2 times a second event because in the whole 
sequence of n elements k \ of them have already been taken so have now only n — k\ remaining 
on which we can get the k 2 desired. 

These relations then show us that this is a situation where each event probability is considered 
as a binomial (hence its name ...). 

So we have in the case of two sets of f-uples: 

r<n fci /~m—ki k2 _ rm ki k 2 _ 

U kiPl L 'fc 2 P 2 — Pi P‘2 ~ 


(n — ki)\ k k 
P 1 P 2 

ki\(n - h)\ k 2 \((n - h) - k 2 )\ 



k\\k 2 \ n — ki — k 2 


and because: 

we get: 

—ki — k 2 = —n 

P — —n kl V k2 — — v kl v k2 

- h\k 2 \ o\ Pl p 2 - h\k 2 \ Pl 




info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

and we see that the construction of this distribution therefore requires that: 

Y,Pi = 1 J2 k i =Tl 

i i 


Thus, by induction we have the probability M. we were looking for and called the "Multinomial 
function" and using previous relation given by: 

in the spreadsheet software Microsoft Excel 11.8346, the term: 





is named "multinomial coefficient" and is available under the name of the function 
MULTINOMIAL ( ). In the literature we also find this term sometimes under the following re- 
spective notations: 

k\ + k 2 + ••• + k n 

k\ i k 2 , . . . , k m 


k i ■ k ‘2 . . . . , k m 


Theorem 4.33. We will show now that the multinomial distribution is effectively a probability 
distribution (because we could doubt ...). If this is the case, as we know it, the sum of the 
probabilities must be equal to 1. 

Proof 4.33.1. Recall that in the chapter of Calculus we proved that (binomial theorem): 


(x + v) n =Y, Cix k y"- k 
k = 0 

Now do a little bit of notation: 

(x 1 +x?r= Y.c£x\'xr k ' = v 



= 0 ~ k i) ! 


1 J/ 2 




and this time a change of variables: 

(*i + x,r = y cix\'xr k ' = e 


k i=0 

_n h\k 2 \ 

,y k l rfk 2 
X 1 x 2 



This last relation (which is a special case of the two terms "multinomial theorem") will be useful 
to us to show that the multinomial distribution is effectively a probability distribution. We also 
take the special case with two groups of drawing: 

M = 


i — 1 

n = 


i — 1 

ki\k 2 \ 

p k M 2 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

which can is also written by the construction of the multinomial distribution: 

M = 


; p\'pT k ' 

ki\(n — k\)\ 

and therefore, the sum must be equal to the unit such that: 



^ 0 hKn-ki)\ 

p ki p n-ki = 1 

To check this we use the multinomial theorem shown above: 

(pl+P2) n = Y, 


fel =o k ^ 

P k lP2 2 



However, by construction of the multinomial sum of probabilities is unitary, we have effectively: 

&>,+!*)"= (1)" = 1= £ 


M k\\{n — ki)\ 

p kl P2~ kl 


□ Q.E.D. 


El. We launch an unbiased die 12 times. What is the probability that all 6 faces 
appear the same number of times (not necessarily consecutively!) that means twice for 

M = 



12 ! 

ki _ 

i 6 "^6 


n^ i=i n 2! ' =1 

12 ! 

2 6 6 12 

= 0.34% 


i = 1 

i = 1 

where we see well that m is the number of success groups. 

E2. We launch an unbiased die 12 times. What is the probability that a single unique face 
appears 12 times (hence the "1" appears 12 times, or the "2" or the "3", etc.): 

*= 1 i = 1 


So we end up with this last example known a being a binomial distribution result. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

7.6.8 Poisson Distribution 

For some rare events, the probability p is very small and tends to zero. However, the average 
value np tends to a fixed value as n tends to infinity. 

We start from a binomial distribution with mean /i = np that we will assume finite when n 
tends to infinity. 

The probability of k successes after n trials is (Binomial distribution): 

S(n, k) = C n k p\ 1 - p) n ~ k = Clp k q n ~ k =r k ) p k q n ~ k 



By writing p — — (where m will be temporarily the new notation for the mean according to 


H = np), this expression can be rewritten as: 


By grouping the terms, we can put the value under the form: 

^ . m f m 

B{n,k) = — (l 

k\ v n 


(n — k)\n k (l--) 
V n ' 


We recognize that when n tends to infinity, the second factor of the product has for limit e -A1 
(see Functional Analysis). The third factor, since we are interested to the small values of k (the 
probability of success is very small), its limit for n tending to infinity is equal to 1. 

This technique of passing to the limit is sometimes named in this context: the "Poisson limit 

So we get the "Poisson distribution" (or "Poisson law"), also sometimes named the "law of rare 
events" therefore given by: 



which can be obtained in Microsoft Excel 1 1 .8346 with the function POISSON ( ) and in prac- 
tice and the specialized literature is often indicated by the letter u. 

It is indeed a probability distribution since using the Taylor series (see chapter Sequences And 
Series), we show that the sum of the cumulative probabilities is: 

+oo ..k +oo . k 

E = e - " E fr = e-V 1 = 1 (7.378) 

k = 0 k = 0 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


We will frequently encounter this distribution in different sections of the book such as 
in the study of preventive maintenance in the section of Industrial Engineering or in the 
section of Quantitative Management for the study of queuing theory (the reader can refer 
to them for interesting and pragmatic examples) and finally in the field of life and non-life 

V / 

Here is a plot example of the Poisson distribution and cumulative distribution function with 
parameter p = 3: 

Figure 4.83 - Poisson law V (mass and cumulative distribution function) 

This distribution is important because it describes many processes whose probability is small 
and constant. It is often used in the "queing theory" (waiting time), acceptability and reliabil- 
ity test and statistical quality control. Among other things, it applies to processes such as the 
emission of light quanta by excited atoms, the number of red blood cells seen under the micro- 
scope, the number of incoming calls to a call center. The Poisson distribution is valid for many 
observations in nuclear and particle physics. 

The mean (average) of the Poisson distributions is (we use the Taylor series of the exponential): 

+oo +oo k +oo k — 1 

H = E (k) = ^(/b k ) = = e ”^ 51 (k _ ip = e _/ W = I 1 (7.379) 

and gives the average number of times that you get the desired outcome. This result may seem 
confusing .... the mean is expressed by the mean?? Yes must simply not forget that it is given 
since the beginning by: 

H = rip (7.380) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

The variance of the Poisson distribution function is itself given by (again we use the Taylor 

2 = V(fc) = E [k-^]P t = T,lk~^% 

k = 0 k = 0 K - 


always with: 


H = np 

The important fact for the Poisson distribution is that the variance that is equal to the mean is 
name the "equidispersion property of the Poisson distribution". This is a property often used 
in practice as an indicator to identify whether the data (with discrete support) are distributed 
according to a Poisson distribution. 

The theoretical laws of statistical distributions are determined assuming completion of an infi- 
nite number of measurements. It is obvious that we can only perform in practice a finite number 
N. Hence the utility to establish correspondence between the theoretical and experimental val- 
ues. For the experimental values we obviously obtain only an approximation whose validity is, 
however, often accepted as sufficient. 

Now we will prove an important property of the Poisson distribution in the field of engineering 
and statistics that we name "stability by addition". The idea is as follows: 

Let X and Y be two independent random variables with Poisson distribution of respective 
parameters A and //. We want to ensure that their sum is also a Poisson distribution: 

X + Y = P A+M 


See this: 



P(X + Y = k) = ]T P [(X = i) n (Y = k-i)\ = Y J P{X = i)P( Y — k — i) (7.384) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

7^o il (k-i)\ 

because the events are independent. Then we have: 

k k yi - A ..k—ipH 

P(X + Y = k) = J2 p ( x = i) p ( Y = k-i) = Y, ' ^ 

2=0 2=0 

* k\ , t _, 

However, by applying the binomial theorem (see section Calculus): 


k u k 

E = ec‘aV-< = (a + „)' 

i=o z! ( K — i=o 


So in the end: 

p— (A+/i) 

P(x + y = fc) = (A + / u) fc — — 


and therefore the Poisson distribution is stable by addition. So any Poisson distribution where 
the parameter is verbatim indefinitely dividable into a finite or infinite sum of independent 
Poisson distributions. 

7.6.9 Normal & Gauss-Laplace Distribution 

This characteristic is the most important function of distribution in the field of statistics follow- 
ing a famous theorem named the "central limit theorem", which as we will see later, permits to 
prove (among other things) that any sum of independent identically distributed random variables 
with a finite mean and variance converges to a Laplace-Gaussian function (Normal distribution). 

It is therefore very important to focus your attention on the developments that will be presented 
right now! 

Let start from a binomial function and make tender the number of trials n to infinity. If p is set 
from the beginning , the mean ji = np also tends to infinity, furthermore the standard deviation 
cr = npq also tends to infinity. 


The case where p varies and tends to 0 while keeping fixed the mean has already been 
studied during the study of the Poisson function. 

v i ; f 

If we want to calculate the limit of the binomial function, it will then be necessary to make a 
change of origin, which stabilizes the mean, to 0 for example, and a change of unit change that 
stabilizes the standard deviation to 1 for example. 

Let us now denote by P n (k) the binomial probability of k success and let’s see first how P n (k) 
vary with k and calculate the difference: 

Pn{k + 1 ) — P n (k) 

p k+l q n - k - 1 

P k q n ~ 


' p k q n ~ k 

f (n — k)p \ 
\(k + l)q ) 

P n (k) 

np — k — q 
(. k + 1 )q 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

We conclude that P n (k) is an increasing function of k, as np — k — q is positive (for n. p and 
q fixed). Too see it, juste take a few values (of the right term of the equality) or to observe the 
graph of the binomial distribution function, remembering that: 

H = np (7.389) 

As q < 1 it is therefore evident that the value of k close to the mean f t = np of the binomial 
distribution is the maxima of P n (k). 

On the other hand the difference P n (k + 1) — P n (k) is the increase rate of the function P n (k). 
We can then write: 

ap,m = m + 1) - pm 

A k (k + l)-k ' ’ 

as being the slope of the function. 

We now define a new random variable such that its average is zero (negligible variations) and 
its standard deviation equal to the unit (this will be a centered-reduced variable in other words). 
Then we have: 

x = 

k — np 



Then we also have with this new random variable: 

(k + 1) — np k — np 


Ax = 

[(k + 1) — np] — (k — np) 


yjnpq yjnpq yjnpq 

Let us write denote F(x) as being P n (k ) calculated using the new random variable with zero 
mean and unit standard deviation which we seek the expression when n tends to infinity. 

Let go back to: 

^Pn(k) _ np-k-q _ — ( k-np)-q 
A k \k + 1 )q A ’ (k + l)q A ’ 


To simplify the study of this relation when n tends to infinity and k to mean // = np, multiply 
both sides by npq/ npq : 

A P n (k) npq np — k — q npq —{k — np) — q npq 

A k yjnpq (k + 1 )q yjnpq " ' (k + 1 )q yjnpq 

We rewrite now the right-hand side of this equality. Then we get: 

— (fc — np) — q np _ [-(fc - np) - q)np 

(k + 1) yJFpq nU (k + l)y/rm nU 
And now let us rewrite the left term of the prior-previous relation. We then get: 
A P n (k) npq A P n {k) npq A P n {k) npq 

Ak -yjnpq (k + 1) — 1 yjnpq [(k + 1) — np] — (k — np) yjnpq 

A P n (k) npq 

[(k + 1) - np] — (k - np) ^Jnpq^Jnpq 


AP n (fc) _ A P n (k) 

[( k + 1) - np) - (k - np) Ax 





info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

After passing to the limit for n tending to infinity we have in a first time for the denominator of 
the second term of the prior-previous relation: 

[—(k — np) — q)np 
(k + 1 )y/npq 



the following simplification: 

(k + 1 ) yjnpq = kJnpq 

v n— 5>+oo v 




— np) + q]np 


k — np 

npP n (k ) 





and in a second time, taking into account that the considered values of k are then in the neigh- 
borhood of the mean np, we get: 



and as: 

k — np 



npk — np ^ k — np 
k y/npq n— h-oo yjnpq 


qnp ^ qnp 
ky/npq n->+oc npy/npq 

—[(k — np) + q]np 

d 0 

y/npq n->+oo 

n— >+oo 



P n (k) := F(x) (7.403) 

n— >-+oo 

where F(x) represents (awkwardly) for the few lines that follow, the density function as n tends 
to infinity. 

Finally we have: 

d F(x) 


This relation can also be rewritten rearranging the terms: 



d F(x) 




and by integrating both sides of this equality we obtain (see section Differential And Integral 

In PP)) = -y + P 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

The following function is a solution of the above equation: 


F(x) = Ae 2 





In Ae 2 = ln(A) + In e 2 = c te + — — = 



Y + c 



The constant is determined by the condition that: 


/ F(x)dx 


which represents the sum of all probabilities, that mus be equal to 1. We prove for this that we 
need to have: 

A = 


Proof 4.33.2. We have: 

-Too —X 


2 dx = 



^7 dx = \/2 / e "d z 



So let us focus on the last term of this equality. Thus: 



1= / e~ x2 dx = 2 / e~ x2 dx 


since e x is an even function (see section Functional Analysis). Now we write the square of 
the integral as follows: 

J 2 = 4 lim 

R — >4-oo 

R \ / R 

e~ x2 dx j ( J e~ y2 dy 

R R 

= 4 lim 

R — >+oo 

e -(F+y 2 )dxdy (7.413) 

.o o 

and make a change of variable passing in polar coordinates, therefore we also use the Jacobian 
in these same coordinates (see section Differential And Integral Calculus): 

I 2 = 4 lim 

R — >4-oo 

e r rdrd6) =4 lim 

J i?— > 4-oo 


2 R 


e r rdrd(f) 


= 4— lim e r rdr = 27T — e 

. 7T 
2 R — H-oo 

0 0 



= 2vr ( 0 + - ) = 7 r 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


I = \[l T 


x 2 

By extension for e 2 we have: 

/ = A -1 = \fFn 

□ Q.E.D. 

We thus obtain the "standard Normal distribution" noted as probability density function (noted 
with the capital letter F that can unfortunately lead to confusion in the present development 
with the notation of the cumulative distribution function... we apologize...): 

x 2 


which can be calculated in Microsoft Excel 1 1 .8346 with the function NORMSDIST ( ) . 

For information, a variable following a Normal centered reduced distribution is by tradition 
often noted Z ("Zentriert" in German). 

Returning to non-normalized variables: 

x = 

k — np k — ii 
yjnpq a 


so we get the "Gauss-Laplace function" (or "Gauss-Laplace law") or also named "Normal dis- 
tribution" given in the form of probability density in this book by: 

P(k, /i, a) = A f(/i, a) 

ik-p) 2 

— )=e 2 a 2 



The cumulative probability (distribution cumulative function) to have a certain value k is obvi- 
ously given by: 

P(k < x) — $(x) 

_ (fc ~ ^) 2 

e 2 a 2 d k 


Here is a plot example of the distribution and cumulative distribution function for the Normal 
law with the parameters example (/i, a) = (0, 1) that is therefore the standard centered reduced 
Normal distribution: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


Figure 4.84 - Normal law JV (mass and cumulative distribution function) 

This law governs under very general conditions, and often encountered, in many random phe- 
nomena. It is also symmetrical with respect to the mean (this is important to remember!). 

We will now show that // well represents the mean (or average) of x (this is a bit silly but we 
can still check ...): 


+oo ( X — nY 

E(A^) = / xf(x)dx = 


e 2cr 2 dx 


We put: 

We then have: 

u = 

x — fl 






E(A") = / xf(x)dx = 


(au + ii)e 2 adu 




= o- 


aue 2 du + 



/j,e 2 du 


+oo U 

+oo U 

= CT 


ue 2 du + 

/i / e 

2 du 

Let us now calculate the first integral: 

-Too U 


J = ue 2 du = 


= 0 - 0=0 



info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

So we finally get: 

+oo U 2 

E{X, = 7S tl I e ~ TAu = 7S^ 

— OO 



Rl. The reader might find confusing at first that the parameter of a function is one of the 
results that we seek of this same function (as for the Poisson distribution). What bothers 
is to put in practice such a thing. In fact, everything will be more clear when we will 
discuss later in this chapter the concepts of "likelihood estimators". 

R2. It could be interesting to know for the reader that in practice (finance, quality as- 
surance, etc.) it is common to have to calculate only mean only positive values of the 
random variable which is then naturally defined as "positive mean" and given by: 


E+(X) = 



1 f x — /i x 2 





We will see a practical example of this last relation in the section Economy during our 
study of the theoretical model of speculation of Louis Bachelier. 

Also we will prove now (...) that a is the standard deviation of X (in other words to prove that 
V(A) = cr 2 ) and for this we recall that we had prove that (Huyghens relation): 

V(X) = E(X 2 ) - E(X) 2 

We already know that at the level of the notations we have: 

E(A) = y => E(X) 2 = y 2 

then we first calculate E(A 2 ): 


1 / X — H 

E(A^ 2 ) = / x 2 f(x)dx = 


x 2 e ^ V a 


Let y = (x — fi) / \f2o that therefore leads us to: 


E(A" 2 ) = / x 2 f(x)dx = 


(yV2a + y) 2 e y2 dx 




- f ye~ y dy + -^= f e~ y d y 

7T / \/7T / 


And we know that (already proved above): 


/ ye~ v dy = 0 

e y dy = 







info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

It remains to calculate therefore only the first integral. To do this, we proceed by integration by 
parts (see Differential and Integral Calculus): 

b b 

[ f(t)g'(t)dt = f(t)g(t) \ b a - [ f(t)g(t)dt 

leads us to: 




y 2 e y2 d y = / y (ye y2 ) d y = -y- 

h +°° o 

+ /' 

ir dy= 2 

7 T 

Then we get: 

E(A" 2 ) = -^=a 2 — - + 2/iaJ-O + ~^=y/pi = y 2 + cr 2 
a/ 7 r 2 V 7 r a /pi 

And finally: 

E(X) = E(X 2 ) - E(X) 2 = (/i 2 + cr 2 ) — /i 2 = cr 2 





An additional signification of the standard deviation of Gauss-Laplace distribution is a measure 
of the width of the distribution as (this can be checked only with the aid of integration by using 
numerical methods) for any non-zero mean and standard deviation we have (thanks to John 
Cannin for the ETpXfigure): 

68 . 2 % 

















-3-2-10 1 2 3 

Standard deviations 

Figure 4.85 - Sigma intervals for the Normal distribution 

An additional signification of the standard deviation of Gauss-Laplace distribution is a measure 
of the width of the distribution as (this can be checked only with the aid of integration by using 
numerical methods) for any non-zero mean and standard deviation we have: 

The width of the interval has a great importance in the interpretation of uncertainties measure- 
ment. The presentation of a result like N ± a has for signification that the average value has 
about 68.3% chance (probability) to lie between the limits of TV — cr and N + cr or has approxi- 
mately 95.4% to lie between the limits of N — 2a and N + 2a etc. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

This concept is widely used in quality management in industrial business especially with 
the Six Sigma methodology (see Industrial Engineering) which requires a mastery of 6 
around each side of the mean (!) of the manufacturing (or anything else whose deviation 
is measured). 

The second column of the table can easily be obtained with Maple 4.00b (or also with 
the spreadsheet software from Microsoft). For example for the first line: 

>S : =evalf (int (1/sqrt (2*Pi) *exp(-x~ 2/2) ,x=-l. . 1)) ; 

and the first row of the third column: 


If the Normal distribution was not centered, then we just would write for the second 

>S : =evalf (int (1/sqrt (2*Pi) *exp(- (x-mu) ~ 2/2) ,x=-l . . 1) ) ; 

and so on for any deviation and mean we will then obtain exactly the same intervals! ! ! 

V / 

The Gauss-Laplace distribution is also not only a tool for data analysis but also for data gener- 
ation. Indeed, this distribution is one of the largest used in the world of multinationals that use 
statistical tools for risk management, project management and simulation where a large number 
of random variables are to be controlled. The best examples of applications use the softwares 
CrystalBall or Palisade @Risk (this last one being my favorite...). 

In this context of application (project management), it is also very common to use the sum (task 
duration) or the product of random variables (customer uncertainty) following Gauss-Laplace 
distributions. We will see now how to to calculate this: 

7.6.9. 1 Sum of two random Normal variables 

Let X, Y be two independent random variables. Suppose that X follows the distribu- 
tion and that Y follows the distribution /V"(/r 2 , cr 2 ). Then the random variable 

Z = X + Y has a density equal to the convolution product of f x and f y . That is to say: 

+oc (x — ^i) 2 (s-X-/i 2 ) 2 

— OO 

— OO 

which is equivalent to the joint product (see Probabilities) of the probabilities of occurrence of 
the two continuous variables (remember the same kind of calculation in discrete form!). 

To simplify the expression, make the change of variable t = x — fi \ and let us write a = 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


(s-y - y 2 2 ) 2 = - (-s + y + /tJ) 2 = (-s + y + /i 2 ) 2 = (t + a) 2 (7.437) 

we get after a hard to guess rearrangement trick: 

+oc t 2 (t + a) s 


2 \ 2 2 2 2 

aaf \ <j{a <72 

<7 t H I + 


/*(«) = 


2o i e 2ct 2 d t = 


27\7 ■ 


a +oo 


~2(7 2 

2 \ 2 
acrf x 



2^1 d t 

We write: 


ac y\ 

u = 


d U 7 

r -r — r- => dt = V27 x7 2 — 
V 2(7 x7 2 d t V 2(7 i(7 2 CT 


a +oo 

at + 



a +oo 

/*(s) = 

2(7 2 

27T(7 i (72 

— C 

Knowing that (proved above): 

2 ^2 d t = 


"2(7 2 / e -“ 2 dM 



e “ 2 d u = \/7r 

(7 Z 







our relation becomes: 

a 2 — (— s + /ti + y 2 ) 2 — (s — /xi — y 2 )‘ : 



We recognize the expression of the Gauss-Laplace distribution (Normal law) of mean ji \ + y 2 
and standard deviation <7 = \J 7 2 + cr 2 . Therefore, X + V' follows the distribution as written by 
the physicist (both argument have same units): 

J\f (/ii + y 2 , ^/cr 2 + 7^j (7.444) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

and as noted by most mathematicians, statisticians: 

A/" (/Ji + fi'2 , (7.445) 

The fact that the sum of two Normal distributions always give also a Normal distribution is what 
we name in statistics "stability of the sum" of the Gauss-Laplace distribution (Normal law). We 
will find such properties for other distribution that will be discussed later. 

So as well as for the Poisson distribution, any Normal distribution whose parameters are known 
is verbatim indefinitely divisible into a finite or infinite number of independent Normal distri- 
bution that are summed as: 

Af(/x,cx) = J2J\f (-, — ) (7.446) 

\n n J 


The families of stable distribution by the sum is an important field of study in physics, fi- 
nance and statistics called "Levy alpha-stable distribution". If time permits, I will present 
the details of this extremely important study in this chapter. 

V I 

1. 6.9.2 Product of two random Normal variables 

Let X, Y be two real independent random variables. We denote by f x and f Y the cor- 
responding densities and we seek to determine the density of the variable Z = XY (very 
important case, particularly in engineering). 

Let F denote the density function of the pair (X, Y). Since X, Y are independent (see section 

f(x,y) = fx(x)f Y (y) (7.447) 

The distribution function of Z is: 

F(Z) = P(Z <z) = P(XY < z) = ff D f(x,y)dxdy = fj ^ f x (x)f Y (y)dxdy (7.448) 

where D = {(x,y)\xy < z}. 

D can be rewritten as a disjoint union (we do this for anticipating in the future change of 
variables a division by zero): 

D = D x U D 2 U D 3 (7.449) 


Di = {(x, y) G M 2 |x?/ << zAi>0} 

D ‘2 = {(x, y) G M 2 |x?/ < 2 A x < 0} (7.450) 

D 3 = {(x, y) G M 2 |x?/ < 2 A x = 0} 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

We have: 

F ( z ) = If fx{x)f Y {y)dxdy + [[ fx(x)f Y (y)dxdy + ff fx{x)f Y {y)dxdy (7.451) 
JJ D\ JJ D2 77 /);5 

" ; v " 


The last integral is equal to zero because D 3 is of measurement (thickness) zero for the integral 
along x. 

We then perform the following change of variable: 

x = xt — xy (7.452) 

The Jacobian of the transformation (see section Differential and Differential Calculus) is: 

1 0 

- t/x 2 l/x 





F(z) = 

z +00 

fx( x)f Y (t/x) 

z 0 


fx{x)f Y {t/x ) 

z +00 

dxdf = 

fx(x)f Y (t/x) 



— 00 0 

— 00 —00 

—00 —00 


Let f z be the density of the variable Z. By definition: 


F(Z < 1)3 F(z) = [ f z {t)dt 


On the other hand: 

Z +CX) 

F(z) = 

fx(x)f Y {t/x) 




— CO —OO 

as we have seen. Therefore: 


What is a bit sad is that in the case of a Gauss-Laplace distribution (Normal distribution), this 
integral can only be easily calculated numerically ... it is then necessary to use Monte Carlo 
integration type methods (see section Theoretical Computing). 

However according to some research done on the Internet, but without certainty, this integral 
may be calculated and give a new distribution called "Bessel distribution". 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 Bivariate Normal Distribution 

If two Normal distributed random variables are independent, we know that their joint 
probability is equal to the product of their probabilities. So we have: 

P = P1P2 




0 - m 1 ) 2 

2 erf 



0 ~ M2) 2 

2 erf 

{x - Ml ) 2 (x - M2) 2 

I e 2 erf 2 erf 

27rcrier 2 


Now comes an approach that we will often find in the follow developments: to generalize 
simple algebra models, you have to think in a Linear Algebra way! Therefore we are left with 
two vectors involving a scalar product: 

(x-/ri) 2 (x-fi 2 y 

p = PiP 2 = 

2 l T G\G 2 


2er. 2 


2 \ x 2 

2 = 

2i rcr \G 2 

2o\2 2 


But we can do even better because for the moment there is no added value to this notation! 
Effectively a subtle idea is to involve the determinant of a matrix (see section Linear Algebra) 
and the inverse of this same matrix in the previous relation: 

x 1 

x 2 


M 2 

P=RPo = 


'X l ~ M l 

x 2 - M 2 

2(Ti2 2 


x 2 



27r(crfcr 2 — 0 ■ 0) 1 / 2 





cr{ U 

.0 cr 2 


xi - Hi 

x 2 — M2, 

\ T 

erf 0 

0 a 2 





Xi - /i[ 

2:2 — M2 


We thus find a particular case of the variance-covariance matrix. In the field of the bivariate 
Normal distribution is it is customary to write this last relation in the following form: 

f(X i,X 2 ) = 

27r|E| 1 /2 

-;:0-/4 Ts ^ 

e 2 


If we make a plot of this function we get: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Figure 4.86 - Plot of the bivariate Normal function in MATLAB™ 







or another one (not with the same values) with corresponding projections: 

Figure 4.87 - Plot of the bivariate Normal function with pgfplots 

Now consider the important case in engineering, astronomy and quantum physics by returning 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

to the following notation: 

{x - /ii ) 2 (x - n 2 y 

P = f{X u X 2 ) = 

2 a\ 




and by focusing on to the iso-lines such that for any pair of values of the two random variables, 
we have: 

{x-fi i ) 2 (x~n 2 y 

2 ( 7 ? 


27TCricr 2 

By doing some very basic algebraic manipulations, we get: 

(x-/ii) 2 (x-/i 2 ) 2 

2 = C 



and we get: 

2 ( 7 ? 2(72 

{x-Hxf (x-/i 2 ) 2 

= In (27T(7i<7 2 C te ) 


2 ( 7 ? 

{x - /Zi ) 2 

2(7 o 

= In 

2(7? In 

277(7! (7 2 C te 


27T(7 1 (7 2 C te 
(x - /i 2 ) 2 

2(7? In 

27T(7 1 (7 2 C te 

= 1 





We recognize here the analytical equation of an ellipse (see section Analytical Geometric) ! 

A plot of iso-lines with fi = ( ME = 

25 O' 
0 9 

give us: 

-4 -2 0 2 4 e 8 10 12 

Figure 4.88 - Plot of the iso-lines of the bivariate Normal function (non-correlated case) 

But now recall that when we got: 

f{X u X 2 ) = 

1 1 (x-Y) 

e £ 

2tt|E| 1 /2 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

the variance-covariance matrix was zero everywhere except on the diagonal, implying verbatim 
the independence of the two random variables. We can obviously guess that the generalization 
is that the variance-covariance matrix is non-zero in the diagonal and then the two random 

variables are correlated. Consequently, the iso-lines become with values such as /i 

,£ = 

10 5 
5 5 

Figure 4.89 - 

-2 0 2 4 6 8 10 

Plot of the iso-lines of the bivariate Normal function (correlated case) 

So the correlation rotates the axis of the ellipses! Note that we have therefore: 


£ = 


0" 12 







=* S ' 1 = 


°12 U 







and thus verbatim: 


|S| — o‘nO r 22 cr i2 (7.469) 

Recall that we saw during our study of the correlation coefficient that (well... normally... the 
R notation for the correlation is used only if the variances are estimated but as it is the most 
common notation in practice we will still us it...): 

(T12 — COv(Xi,X 2 ) — c(Xl, X 2 ) — Rx 1 ,X 2 (T ll (T 22 


1 _ 1 


— 0h2 



~ O' 12 

_2 ^-2 ^.2 
(J ll (J 22 ~ a l2 


^22 . 


- RI 2 ) 


0*22 . 



and the exponent of the exponential of the bivariate Normal takes a form that we can found very 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

often in the literature: 

x\ - Hi\ T ( 1 

X 2 — jJ-2/ \C r ll 0 ‘22(l — -^ 12 ) 


Xi ~ Hi 

&11 —&12 

P'21 °22 

xi - Hi 
X 2 ~ H 2 

X2 H2/ V cr ll cr 22(l — -^12) 




— Rl2Vll&22 


X\ - Hi 
X 2 - H 2 

o\ 2 {xi - Hi) 2 + <xh(x 2 - H 2) 2 - 2Ri2(Tii(J22{xi - Hi)(x 2 - H 2 ) 


1-i? 2 


^1^2(1 -^2) 
Xi - Hl \ 2 , / X2 - H 2 V 
(Til ) V 0-22 ) 

2 R 


Xl — Hi \ ( x 2 — H2 


(72 2 

Note that if the random variables are centered reduced, then we have: 


E” 1 = 

1 - r\ 2 

and thus the exponent of the exponential of the bivariate Normal distribution becomes: 



E -1 = 

1 - m 

~ 2 ~ [x\ + x\- 2 R 12 XiX 2 ] 


Thus, the density function of the bivariate Normal centered reduced distribution will be written: 
Now consider the important case in engineering, astronomy and quantum physics by returning 
to the following notation: 

x i + x 2 — 2R12X1X2 

f(xi,x 2 ,R) = 

2x(l - R\ 2 ) 


2(1 - R\ 


e 2 e 


1 - Rl 

" V 27r (! - RI 2 ) ^ R 2 12 ) 


= A/"(0, 1 — R 12 )J\f' (0, 1 — Ri 2 )e 

R 11 X 1 X 2 

= Af(0, l)Af(0, 1) (l - f4) 2 e 1 - R i2 




Thus, we can see that a bivariate Normal reduced centered distribution function normal can be 
constructed by the multiplication of two Normal centered and reduced distributions themselves 
multiplied by a term that depends mainly on the correlation parameter. The latter term includes 
the nature of the dependence of the two random variables and provides the link between the 
marginal distributions (both Normal centered and reduced) to obtain the joint bivariate Normal 

If necessary (this can be very useful in practice), here is the Maple 4.00b code to plot a bivariate 
Normal function (taking the last example) even if it is also very simple to do with a spreadsheet 
software like Microsoft Excel: 

>f :=(x,y,rho ,mul ,mu2 , sigmal , sigma2) -> (1/ (2*Pi*sqrt (sigmal*sigma2* (l-rho~2) ) ) ) 
*exp( (-1/ (2* (l-rho~2) ) ) * ( ( (x-mul) / sqrt (sigmal) ) ~ 2 + ( (y-mu2) / sqrt (sigma2) ) ~2 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

-2*rho* ( (x-mul)/sqrt (sigmal) ) * ( (y-mu2) /sqrt (sigma2) ) ) ) ; 

>plot3d(f (x ,y , 5/ sqrt (10*5) ,3,2,10,5) ,x=-4 . . 10 ,y=-4. . 9 ,grid= [40,40] ) ; 
and for the plot with the iso-lines: 

>with(plots) : 

>contourplot (f (x,y , 5/ sqrt (10*5) ,3,2,10,5) ,x=-4 . . 10,y=-4. . 9,grid= [40,40] ) ; 

and we can check that it is a probability density function by writing: 

>int (int (f (x,y ,5/sqrt (10*5) ,3,2,10,5) ,x=- infinity . . . +inf inity) 

,y=-inf inity . . .+inf inity) ; 

or calculate the cumulative probability between two intervals: 

>evalf (int (int (f (x,y , 5/sqrt (10*5) ,3,2,10,5) ,x=-3 . . . +4) ,y=-5 . . . +2) ) ; 

7.6.9 .4 Normal Reduced Centered Distribution 

The Gauss-Laplace distribution is not tabulated as we must then have so many numerical tables 
as possible values for the mean //. and standard deviation a (which are the parameters of the 
function as we have seen it). 

Therefore, by a change of variable, the Normal distribution becomes the Normal reduced cen- 
tered distribution more often named the "standard Normal distribution" where: 

1. "Centered" refers to subtracting the mean /i to the measures (thus the distribution function 
is symmetric to the vertical axis). 

2. "Reduced" refers to the division by the standard deviation a (thus the distribution function 
has a unit variance). 

By this change of variable, the variable k is replaced by the reduced centered random variable: 

k* = A — ^ (7.476) 


If the variable k has for mean /i and standard deviation a then the variable k* has a mean of 0 
and standard deviation of 1 (this last variable is usually denoted by the letter Z). 

Thus the relation: 

P(k, /T cr) 

(k~ /Q 2 

— 2<t 2 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

is therefore written (trivially) more simply: 


P(k*, 0,1) 


which is just the explicit expression of the reduced centered Normal distribution ("standard Nor- 
mal") often denoted Af(0, 1) which we will find very often in the sections of physics, finance, 
quantitative management and engineering ! 


Calculate the integral of the previous relation for an interval can not be done accurately 
formally speaking. One possible and simple idea is then to express the exponential in 
a Taylor series and then be integrated term by term of the series (making sure to take 
enough terms for convergence!). 

V / Henry’s Line 

Often in business it is the Gauss-Laplace (Normal) distribution that is analyzed but com- 
mon and easily accessible software like Microsoft Excel are unable to verify that the measured 
data follow a Normal distribution when we do the frequency analysis (there are no default 
integrated tool allowing users to check this assumption) and we do not have the original 
ungrouped data. 

The trick then is then to use the reduced centered variable that is build as we have see above 
with the following relation: 

k* = k — ,l (7.479) 


The idea of the Henry’s Line is then to use the linear relation between k and k* given by the 
equation of the line: 

k- = m = - - ^ 

<7 <7 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


Suppose we have the following frequency analysis of 10,000 receipts in a supermarket: 


of receipts 

of receipts 

Cumulated number 
of receipts 

Relative frequencies 
of receipts 









[100, 1 50[ 
























[400 and + 




Table 4.25 - Supermarket receipt amount distribution 

If we now plot this in Microsoft Excel 1 1.8346 we get: 

Figure 4.90 - Distribution of receipts amount 

What looks terribly like a Normal distribution, thus the authorization, without too much 
risk to use in this example the technique of Henry’s line. 

But what can we do now? Well... now that we know the cumulative frequency, it 
remains for us to calculate each k* using numerical tables or the WORMS INV ( ) function 
of Microsoft Excel 1 1.8346 (remember that formal integration of the Gaussian function 
is not easy...). 

This will give us the values of the standard Normal distribution Af(0, 1) of these respec- 
tive cumulative frequencies (cumulative distribution function). So we get (we leave to 
the reader to take its statistic table or open its favorite software...): 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Upper limit 
of the interval 

relative frequencies 

Correspondance for k* 
of A/"(0, 1) 




























Table 4.26 - Cumulative relative frequencies to the Henry’s line 

Note that in the type of table above, in Microsoft Excel, the null and unit cumulative 
frequencies will generated some errors. You should then play a little bit... 

As we specified earlier, we have under discrete form: 

k * = f(ki) = - - ^ (7.481) 

cr a 

So graphically in Microsoft Excel 11.8346 we can thanks to our table plot the following 
chart (obviously we could do strictly a linear regression in the rules of art as seen in the 
chapter of Numerical Methods with confidence, prediction intervals and other stuffs...): 

Figure 4.91 - Linearized form of the distribution 

So thanks to the linear regression given by Microsoft Excel 1 1 .8346 (or calculated by you 
using the techniques of linear regressions seen in the chapter on Numerical Methods). It 

k* = f(k ) = - - - = O.Olfc - 2 (7.482) 

cr cr 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

we immediately deduce that: 

cr = 100 n = 200 


This is thus a particular technique for a particular distribution! Similar techniques more 
or less simple (or complicated depending on the case...) exist for others distributions. 

See now another approximate approach to solve this problem. Let take us again our table 

for this example: 

Price Upper limit Center Relative cumulative 

of receipts of the interval 

frequencies in % 


50 25 



100 75 



150 125 



200 175 



250 225 



300 275 



350 325 



400 375 


[400 and + 



The average is now calculated using the central value of the intervals and sample sizes 
according to the relation we have seen at the beginning of this section: 

V — > /V" 

. _ Ei=i n i x i 
Ei= 1 n i 


Price Center 

Relative cumulative 


of receipts 

frequencies in 


[0,50[ 25 



[50,100[ 75 



[100,150[ 125 



[150,200[ 175 



[200,250[ 225 



[250,300[ 275 



[300,350[ 325 



[350,400[ 375 



[400 and + 








’ ’ - int; no 


The average that we have calculated yet is also quite close to the average obtained previ- 

ously with the Henry’s line. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

The standard deviation will now be calculated using also the central value of the intervals 
and sample sizes according to the relation seen at the beginning of this chapter: 

a = 



J2 n i(xi -ft) 

i = 1 


J2 n i 

i = 1 


Center Relative cumulative 


of receipts 

frequencies in 






[50, 100[ 




























[400 and + 






Standard Deviation: 


The standard deviation that we have calculated yet is also quite close to the standard 
deviation obtained with the method of the Henry’s line. 

7. 6.9.6 Q-Q plot 

Another way to judge of the quality of fit of experimental data with a theoretical distri- 
bution (whatever that is!) is the use of a "quantile-quantile plot" or simply called "q-q 

The idea is pretty simple, it based on the comparison the experimental data relatively to the 
theoretical data that are supposed to follow a particular distribution. Thus, in the case of our 
example, if we take the values of the mean (~ 200) and standard deviation (~ 100) obtained 
with the Henry’s line as theoretical parameters for the Normal distribution, we get: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


of receipts 

Upper experimental 
limit (imposed) 

Relative cumulative 
frequencies in % 

Upper theoretical 









[100, 1 50[ 
























[400 and + 




Plotted, this gives us the famous Q-Q plot: 

And of course we can compare the observed quantiles with the supposed theoretical distribution. 
More the points will be aligned on the line of unit slope and zero intercept origin, the better will 
be the fit! It’s very visual, very simple and widely used by non-specialists in business statistics. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

7.6.10 Log-Normal Distribution 

We say that a positive random variable X follows a "log-normal function" (or "log-normal 
distribution") if by writing: 

y = ln(x) (7.486) 

we see that y follows a Normal distribution of mean /j and variance cr 2 (moments of the Normal 

Verbatim by the properties of logarithms, a variable can be modeled by a log-normal distribution 
if it results of the multiplication of many small independent factors (property of the product in 
sum of the logarithms and stability of the Normal distribution by the addition). 

The density function of X for x > 0 is then (see section Differential And Integral Calculus): 

(ln(.r) - /i) 2 

f(x) = a) = y=e 2a 2 

oxyj 2n 


that can be calculated in Microsoft Excel 11.8346 with the L0GN0RMDIST( ) function or its 
inverse by LOGINV ( ) . 

This type of scenario is frequent in physics, in technical maintenance or financial markets in the 
options pricing model (see the respective sections of the book for various application examples). 
There is also an important remark with respect to the log-normal distribution further when we 
will develop the central limit theorem! 

Let us show that the cumulative probability function corresponds to a Normal distribution if we 
make the change of variables mentioned above: 


J f(x) dx 




(ln(x) — y) 2 
2cr 2 dx 

-Too (ln(x) — y) 2 

— = / -e 2cr 2 dx (7.488) 
(XV 27T J x 

by writing: 

and (by definition): 

y = ln(x) =>■ 



- <S=> dx = xdy 


x = e y 


we then get: 


/(x)dx = 


a/ 27 T J x 

+oc (ln(x) - y) 2 
-e 2cx 2 dx 

x + r°i Jy -^) 2 


a/ 27 r J x 

—e 2cx 2 xdy 

+°° (y - 




e 2cx 2 dy 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

So we found again the Normal distribution! 

The mean (average) of X is then given by (the natural logarithm being not defined for x < 0 
we start the integral from zero): 

+oo +oo (ln(x) — fl ) 2 

EpT) = f x/(x)dx = — \ f x—e 2cr 2 dx = — \ i 

0 0 0 
-t-oo (u ~ Li ) 2 

1 r e 2 a 2 +U du 

+oo (ln(x) — /r) 5 

2cr 2 dx 



where we performed the change of variable: 

u = ln(x) x = e u xdu = dx (7.493) 

The expression: 

(u - iif 

2 (X 2 

+ u 

moreover being equal to: 

2cr 2 

((w - (n + a 2 )) 2 - (n + a 2 ) 2 + /U 2 ) 

the last integral also becomes: 

EPO = 

{(J, + cr 2 ) 2 - /j 2 
e 2(T7r 


+oo 1 



(. H + cr 2 ) 2 - /i 2 a* 

= e 2o7t = e M+ T 




and where we used the property that emerged during our study of the Normal distribution, that 
is to say that any integral of the form: 


(x — c te 


dx = crV2n 

— OO 


always has the same value! 

To calculate the variance, recall that for a random variable X, we have the Huygens theorem: 

\{X) = E(X 2 ) - E(X) 2 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Let us calculate E(A" 2 ) by performing similarly to previous developments: 


-2\ / 2 . 

+oo (ln(x) — /i) 2 

E(A ) = / x f(x)dx = 

^ . 

+ 00 (m — yu) 2 


2<r 2 dx 

+oo (m — a) 2 

n 1 f 7? ! -+2 u 

e u e 2<j 2 e“du = — , — / e 2<j 2 dw 

g\/2t: J o\[ 7 tK j 

— OO — OO 

+oo {u — (n + 2cr 2 )) 2 (/I + 2<J 2 ) 2 — /i 2 

■-^= / e 

CTa/27T J 

— OO 

(yu + 2a 2 ) - /i 2 
e 2^ 

2cr 2 

2er 2 dw 


+00 (m — (yU + 2cr 2 )) 

/ e 

2a 2 

<TV^27T j 

— OO 

(/i + 2cr 2 ) — yU 2 4/rcr 2 + 4a 4 


= e 

2cr 2 

= e 

2cr 2 = e 2 ^+ 2 <* 2 = e 2 ^+° 2 ) 


where once again we have the change of variable: 

u = ln(a;) s = e“ => dr = e u du 
and where we transformed the expression: 

0 - v) 2 

2a 2 

+ 2 u 



2cr 2 

((w - (n + 2cr 2 )) 2 - (yU + 2a 2 ) 2 + yU 2 ) 

V(A) = E(X 2 ) - E(X) 2 = e 2fJ,+2a2 - ' ^ 


e ~2 

2 \ 2 

— g 2 /. t + 2 cr 2 _ g 2 / i + cr 2 _ g 2 / x + cr 2 ^ cr 2 _ ^ 





Here is a plot example of the distribution and cumulative distribution of the Log-Normal func- 
tion of parameters (/i, a) = (0, 1): 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Figure 4.93 - Log-Normal law (mass and cumulative distribution function) 

7.6.11 Continuous Uniform Distribution 

Let us choose a < b. We define the continuous uniform distribution function or "uniform 
function" by the relation: 

U a ,b(x) = 

b — a 


where l^^j means that outside the domain of definition [a, b] the distribution function is zero. 
We will find this type of notation later in some other distribution functions. 

So we have for the cumulative distribution function: 

X X 

P(X < x)= l[ a ,&]dx = l[a,6] dx = 

J b — a b — a J 

x [a,b\ 

- 1 

b — a 


It is indeed a distribution function because it satisfies (simple integral): 



Pa.hdx = 

b — a 

l[a,t]dx = 

b — a 

[a, b] 

dx = T^—ha..K\x\ b „ = 1 fn.hl ~ — - = 1 (7.506) 

b — a 

b — a 

The continuous uniform function has for expected mean: 

// = E(A") = / xf(x)dx = 


b — a 


1 (b + a) (6 — a) a + b 

dx = 

1 x 2 

b — a 2 

1 b 2 — a 2 
b — a 2 


b — a 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

and for the variance using the Huygens theorem: 

V{X) = E(X I 2 ) - E(X) 2 = / x 2 f(x)dx - 

a + b 

b — a 

x 2 dx — 

a + b 

1 x 3 

b — a 3 

a + b 

1 b 3 — a 3 
b — a 3 

cl T b 

1 (b — a) {b 2 + ab + a 2 / a + b 

~ b-a 3 V 2 

1 2 2 \ l/,2 , 4(6 2 + ab + a 2 ) — 3(b 2 + 2ab + a 2 ) 

= -(b 2 + ab + a 2 ) - -(b 2 + 2ab + a 2 ) = v ; v ; 

3 ' 4 V y 


46 2 + 4a6 + 4a 2 — 3 b 2 — 6 ab — 3a 2 ) b 2 — 2 ab — a 2 (b — a) 2 
12 “ 12 “ 12 


Here is a plot example of the distribution and cumulative distribution of the continuous uniform 
function of parameters (a, b ) = (0, 1): 

Figure 4.94 - Uniform continuous law (mass and cumulative distribution function) 


This function is often used in business simulation to indicate that the random variable 
has equal probabilities to have a value within a certain interval (typically in portfolio 
returns or in the estimation of project durations). The best example of application is 
again CrystalBall or @Risk software that integrate with Microsoft Project. 

V / 

Let us see an interesting result of the continuous uniform distribution (and that applies also to 
the discrete one as well...). 

I often hear managers (who consider themselves at high level) that if we have a measure with 

an equal probability to occur in a closed given interval, then the sum of two such independent 

random variables have also the same equal probability in the same interval! 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Now we will prove here that this is not the case (if someone has a more elegant proof I’m 
interested) ! 

Proof 4.33.3. Consider two independent random variables X and Y that follow a uniform dis- 
tribution in a closed interval [0, a]. We are searching the density of their sum will be written: 

Z = X + Y (7.509) 

Then we have: 

fx (x) = f Y (y) 

+1 if 0 < x, ya 

0 otherwise 

with the variable: 

0 < z < 2a 




To calculate the distribution of the sum, remember that we know that in discrete terms this is 
equivalent to the joint product of probabilities (see section Probabilities) of the occurrence of 
two continuous variables (remember the same ki nd of calculation in the discreet form!). 

That is to say: 


fz(z) = I fx(z-y)f Y {y)dy (7.513) 

— OO 

As f Y (y ) = 1 if 0 < y < a and 0 otherwise then the product of the previous convolution 
reduces to: 


fz(z) = Jfx{z-y)dy (7.514) 


The integrand is by definition 0 except by construction in the interval 0 < z — y < ait is then 

1 . 

Let us focus on the limits of the integral that is in this case the only one that is interesting .... 
First we make a change of variables by writing: 

u = z — y (7.515) 


d u = — d y (7.516) 

The integral can be then written in this interval after the change of variable: 

a z—a z—a 

fz{z) = J fx(z - y)dy = - j fx(u)du = j du (7.517) 

0 2 2 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Remembering that we have seen at the beginning that 0 < z < 2a, then we have immediately 
if z < 0 and z > 2a that the integral is zero. 

We will consider two cases for the interval because the convolution of these two rectangular 
functions can be distinguished according to the situation where at first they cross (nest), that is 
to say where 0 < z < a, and then recede from each other, that is to say a < z <2a. 

• In the first case (nest) where 0 < z < a: 



d u = u\q = z 


z—a 0 

where we changed the lower bound to 0 because anyway fx(u) is zero for any negative 
value (and when 0 < z < a,z — a is precisely zero or negative!). 

• In the second case (dislocation) where a < z <2a: 

fz(u ) 

d u — / du = a — (z — a) = 2a 


where we changed the upper terminal a because anyway fx(u) is zero for any higher 
value (and when a < z < 2a, z is just larger than a). 

So in the end, we have: 

I z if 0 < z < a 

fz(z ) = < 2a — z if a < z < 2a (7.520) 

1 0 otherwise 


□ Q.E.D. 

This is a particular case, deliberately simplified, of the triangular distribution that we will dis- 
cover just after... 

This result (which may seems perhaps not intuitive) can be check in a few seconds with a spread- 
sheet software like Microsoft Excel 1 1.8346 using the RANDBETWEEN () and the FREQUENCY ( ) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

7.6.12 Triangular Distribution 

Let a < c < b . We define the "triangular distribution" (or "triangular function") by construction 
based on the following two distribution functions: 


Pa,c { X ) 

A x - a) 

(6 — a)(c — a) ^ 

and P Ci6 (x) 


(b — a){b — c) ^ 


where a is often assimilated with the optimistic value, c to the modal value and b the pessimistic 

It is also the only way to write this distribution function if the reader keeps in mind that the base 
of a triangle of lenght c — a must have a height h equal to 2/ (c — a) as its total area is equal to 
unity (we will soon prove it). 

Here is a plot example of the triangular distribution and cumulative distribution for the param- 
eters (a, c, b) = (0, 3, 5): 

020 - 
020 - 
010 - 
010 - 
006- - 
000 - 


06- - 


Figure 4.95 - Triangular law (mass and cut 

The slope of the first straight line (increasing from left) is obviously: 

2 a 

( b — a) (c — a) 

and the slope of the second straight line (decreasing to the right): 


( b — a) (b — c ) 

This function is a distribution function if it satisfies: 


P = ( ( Pa,c ( X ) + Pc,b( x )) dx = * 1 




info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

It is in this case, simply the area of the triangle which we recall is simply the base multiplied by 
the height divided by 2 (see section Geometric Shapes): 



This function is widely used in project management in the context of task duration esti- 
mations or in industrial simulations. Where a corresponds to the optimistic value, c to 
the expected value (mode) and the value b to the pessimistic value. The best example 
of application is again the softwares CrystalBall or @Risk that are add-ins for Microsoft 

V / 

The triangular function has also for mean (average): 


H — J xf(x)dx = 

— oo a 


f 2 (x — a) . f 2(6 — x) . 

X— — rdx + / X— — rdx 

(6 — a) (c — a) 

(i b — a) (6 — c) 

( b — a) (c — a) 


(1 9 



| -bx 2 — 

—x 3 ) 

“ (b — a)(c — a) 

3 1 

( b — a)(c — a) 

2 (YV - V 

(b — a)(c — a) V V 2 3 

-x ax 

3 2 

1 3 1 2 \ As 1 3 

-c ac — -a a 

3 2 J \3 2 

i, 2 1 3 

-be c 

2 3 

be 3 + ac 3 

1 , 1 , ,, ba 3 ab 3 , 

-a A c + -ccr -| be 

3 3 3 3 



_ 3 



(6 — a)(c — a) (b — c ) 

-ba 3 - -~ab 3 + -c 
3 3 3 3 

— -be 3 — -a 3 c + -ba 3 — - -ab 3 + -c6 3 + -ac 3 

(i b — a) (c — a) (6 — c) 

1 —be 3 — a 3 c + ba 3 — ab 3 + cb 3 + ac 3 1 (a + b + c) (b — a) (c 



(6 — a)(c — a) (6 — c) 

(6 — a)(c — a) (6 — c) 

a T b T c 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

and for variance: 

+oo c b 

V 2 = [ 0 - fi) 2 f(x)dx = / (x- /i) 2 2 ^ X ^ dx+ f(x-n) 2 2 ^w, X ^ x dx 

./ ./ (o — a)(c — a) ,/ (b — a) (b — c) 

— oo a c 

1 c (—3c 3 + 8c 3 n + 4c 2 a — 6 c/j 2 — 12 c/za + 12 /i 2 a) 1 a 2 (a 2 — 4/ra + 6/i 2 

6 (6 — a)(c — a) ~A (6 — a)(c — a) 

1 —3c 3 + 8c 2 /i + 4c 2 6 — 6 c/i 2 — 12c/i6 + 12/i 2 5 1 6 2 (6 2 — 4^6 + 6/r 2 ) 

6 (6 — a)(c — a) 6 (6 — a)(c — a) 

i.2 !, 2 , 1 2 2 2 1 2 1 2 

= + c ha - 3^0 + “ 3^° + / i + a ca - o C V + a C 

oo3o3 o3o 


We can replace /i by the result obtained before and we get after simplification (it is boring 

a 2 = a 2 + b 2 + c 2 - ab- ac- be (? ^ 

We can show that the sum of two independent random variables, each uniformly distributed on 
[a, b] (i.e. independent and identically distributed) follows a triangular distribution on [2a, 2b] 
but if they do not have the same limits, then their sum gives something that has no name to my 

7.6.13 Pareto Distribution 

The "Pareto distribution" (or "Pareto law"), also named "power law" or "scale law" is the for- 
malization of the 80 — 20 principle. This decision tool helps determine the critical factors (about 
20%) influencing the majority (80%) of the goal. 


This distribution is a fundamental and basic tool in quality management (see Industrial 
Engineering and Quantitative Management sections). It is also used in reinsurance. The 
theory of queues had also some interest in this distribution when some research in the 
1990s showed that this distribution also seems ton explain well a number of variables 
observed in the Internet traffic (and more generally on all high speed data networks). 

V I / 

A random variable is said by definition follow a Pareto distribution if its cumulative distribution 
function is given by: 

P(X ^ x) = 1 - (— ) (7.530) 

V x / 

with x that must be greater than or equal to x m . 

The Pareto density function (distribution function) is then given by: 

f(x) = 

A (^) k = _ T fc A A 

dx ^ x ' m dxx k 


k Xm 


k -\- 1 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

with k € M+ and x > x rn > 0 (then x > 0). The Pareto distribution is defined by two param- 
eters, x m and k (Named "Pareto index"). This distribution is also said to be "scale invariant" or 
"fractal distribution", because of the following property: 

f (c te ■ x ) = kx k m ( c te ■ x )' fc_1 = (c te )' fc_1 kx k m x~ k - 1 = (c te ) _fc_1 f(x) oc f(x) (7.532) 

The Pareto function is also well a distribution function as the cumulative distribution function 
known we have: 


fix) dx =(l- (^) ) 


= ( 1 



f Xm V 
' +00 ' 

The expected mean is given by: 

+oo +oo 


722 ) ) = (1 - 0) - (1 - l fc ) = 1 


I 1 = E(X) = / xf(x) dx = I xk^^x = kx k m / x^x = 

hi rpk 1 

rXjdj m x 


X * 

k — l x fc_1 


kx r . 

k — 1 

iffc > 1. If k < 1, the mean does not exist. To calculate the variance, using the Huygens 

V(X) = E(X 2 ) - E(AT) 2 (7.535) 

we get: 



E(A" 2 ) = / x 2 /(x)x = kx 

kx T 


k - 1 — 2 x k ~ 2 


k-r 2 

k- 2 


hr 2 

’ hJy m 

if k > 2. If k < 2, E(A" 2 ) doesn’t exists. So if k > 2: 

,2 ( kx m \ 2 

° = V « = ~2 ~ {—i) = ~ k _ i )2(fc _ 2) < 7 - 537 > 

If k < 2, the variance doesn’t exists. Here is a plot example of the Pareto distribution and 
cumulative distribution for the parameters (x, x rn , k ) = (x, 1, 2): 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

See that when k — * +00 the distribution approaches b(x — x m ) where 5 is the Dirac delta 

There is another important way to deduce the family of Pareto distributions that allows us to 
understand many things about other distributions and that is often presented as follows: 

Let us write x 0 the threshold beyond which we calculate the mean of the considered quantity, 
and E(y) the mean beyond this threshold x 0 as it is proportional (linearly dependent) to the 
chosen threshold: 

E(y) = ax 0 + b 


This functional relation expresses the idea that the conditional mean beyond the threshold x 0 is 
a multiple of this threshold plus a constant, that is to say a linear function of the threshold. 

Thus, in project management, for example, we could say that once a certain threshold of time 
is exceeded, the expected duration is a multiple of this threshold plus a constant. 

If a linear relation of this type exists and is satisfied, then we talk about a probability distribution 
in the form of a generalized Pareto distribution. 

Consider the mean of the Bayesian conditional function given by (see section Probabilities): 




If we write F(y) the cumulative distribution function f(y), then we have by definition: 

dF(y) = f{y)dy 




and if we define: 

F x (x) = P(X >x 0 ) = l- F x (x) 


what we can assimilate to the "tail of the distribution". 

We get: 


x 0 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

and therefore we seek the very special case where: 


E(Y) = 

F(x o) 

ydF(y) = ax + b 



this is to say: 


ydF(y) = (ax + b)F(x 0 ) 



Differentiating with respect to x, we find: 


— ( / ydF(y) ] = — ((ax + b)F(x 0 )) 



S.X 0 

The derivative of the integral defined above will be the derivative of a constant (valorisation of 
the integral in +oo) minus the derivative of the analytical expression of the integral for x 0 . So 
we have: 




ydF(y) = 




yf(y)dy I = ~xf(x) = -x di ^ = aF(x) + (ax + b)^ F<yX 




and as: 

it comes: 


d F(x) 


= aF(x) + (ax + b) 

d F(x) 

d F = d(l — F) — -d F 

d F - ,xdF 

x—— = ab (x) + (ax + b)—— 
dx dx 

After simplification and rearrangement we obtain: 

aF(x) dx = — (x(a — 1) + b) d F(x) 







which is a differential equation in F(x). Its Resolution provides all forms of seek Pareto distri- 
butions, according to the values taken by the parameters a and b. 

To solve this differential equation, consider the special case where a > 1, b = 0. Then we have: 

aF(x) dx = — x(a — l)dF(x) (7.552) 

By writing: 

k = 

a — 1 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

We then get: 

and therefore: 

It comes: 

— dx = — - 7 , d Fix) 
x kF(x) v ' 

ln(x) = — In (F(x)) + c te 

, ( l _\ 1 1 

in l I — \n(x)+c te —In (^F(x)^j 



W = e k 

= ek 


and therefore: 

In ( — ) = F(x)kd 



We have: 

r,te \ k 

H x )= - = — 


X r 


Then it comes form the cumulative distribution function: 

F(x) = 1 -F(x) = 1- ( — 

\ x 

If we seek for the distribution function, we derive by x to get: 







This is the Pareto distribution we have used since the beginning and called "Pareto distribution 
of type I" (we won’t see in this book those of type II). 

An interesting thing to observes is the case of the resolution of the following differential equa- 

aF(x)dx = —(x(a — 1) + b)dF(x) 

when a = 1, b > 0. The differential equation is then reduced to: 

F(x)dx = —bdF(x) 


-dx = . d F(x) 

b F(x) V ' 




After integration: 

— -X — ln(F(x)) 


info @ sciences. ch 


4. Arithmetic EAME v3. 5-2013 

and therefore: 


F(x) = e~b X (7.565) 

If we make a small change in notation: 

F(x) = e~ Xx (7.566) 

and that we write the distribution function: 

F(x) = 1 - F(x) = 1 - e~ Xx (7.567) 

and by derivating we get the distribution function of the exponential distribution: 

F(x) = \e~ Xx (7.568) 

So the exponential distribution has a conditional mean threshold that is equal to: 

E(y) = ax 0 + b = x 0 + b = x 0 + - = x 0 + a (7.569) 

«=i 1 b 

b =\ 

So the conditional mean threshold is equal to itself plus the standard deviation of the distribu- 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

7.6.14 Exponential Distribution 

We define the "exponential distribution" (or "exponential law") by the following distribution 

P(x) = Xe A:r l[o, + oc] 


with A > 0 that as we will immediately see is in the fact that the inverse of the mean and where 
a: is a random variable without memory. This law is also sometimes denoted £(X). 

In fact the exponential distribution naturally appears from simple developments (see the Nu- 
clear Physics chapter for example) under assumptions that impose a Constance in the aging of 
phenomenon. In the section of Quantitative Management, we have also proved in detail in the 
section on the theory of queues, that this law was without memory. That is to say, that the 
cumulative probability of a phenomenon occurs between the time t and t + s, if it is not realized 
before, is the same as that the cumulative probability of occurring between the time 0 and s. 


Rl. This function is occuring frequently in nuclear physics (see chapter of the same 
name) or quantum physics (see also chapter of the same name) as well in reliability (see 
Industrial Engineering) or in the theory of queues (see section Quantitative Management). 

R2. We can get this distribution in Microsoft Excel 11.8346 with the EXP0NDIST( ) 

It is also really a distribution function because it verifies: 

+00 -|-oo 

J P\(x)dx = A J e~ Xx dx = A e~ Xx + = — (e 

—ocx Ox 

) = -(0 - 1 ) = 1 


The exponential distribution has for expected mean using integration by parts: 





/i = / xP\(x)dx = A / xe Xx dx = — xe 

— Xx 


- I —e~ Xx dx = / e~ Ax dx 


— oo 

- p-Ax 



-Too 1 

o “ A 

and for variance using once again the Huygens relation: 

Y(X) = E(X 2 ) - E(X) 2 
it remains for us to only the to calculate: 



E(X 2 )= / \x 2 e~ Xx x 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

A variable change y — A leads us to: 


E(X 2 ) = 1 J y 2 e- y dy 



A double integration by parts gives us: 

b b 

f f(t)g'(t)dt = f{t)g{t)\ b a - J f'(t)g(t)dt 

a a 




y 2 e y d y = —y 2 e 

+ 2 


- y d y = 2 [~ye~ y |J°°] + 2 / e~ y d y = 2 



E(X 2 ) = (7.577) 

we have therefore: 

V(A') = E(A' 2 ) - E(A) 2 = 7 - (i)' = 1 (7.578) 

So the standard deviation (square root of the variance for recall) and mean have exactly the 
same expression! 

Here is a plot example of the exponential distribution and cumulative distribution for the pa- 
rameter A = 1: 

Figure 4.97 - Exponential law (mass and cumulative distribution function) 

Now let us determine the distribution function of the exponential law: 

P{X ^ x) 

\e~ xt dt = A 


d t 

i - e - A * 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


We will see later that the exponential distribution is a special case of a more general 
distribution which is the chi-square distribution, the chi-square is also a special case of 
a more general distribution that is the Gamma distribution. This is a very important 
property used in the "Poisson test" for rare events (see also below). 

V / 

7.6.15 Cauchy Distribution 

Let A", Y be two independent random variables following a Normal reduced centered distribu- 
tion (with zero mean and unit variance). Thus the density function is given for each variable 

fx(x) = f Y {x) = (7.580) 

V 2 7T 

The random variable: 

T =^\ (7-581) 

(the absolute value will be useful in an integral during a change of variable) follows a charac- 
teristic appearance named "Cauchy distribution" (or "Cauchy law") or even "Lorentz law". 

Let us now determine its density function /. To do this, recall that / is determined by the 
(general) relation: 


Vf g M, P{T < t) = J f(x)dx (7.582) 

— OO 

So (application of elementary differential calculus): 

/(f) = ^P(T < f) (7.583) 

in the case where / is continuous. 

Since X and Y are independent, the density function of the random vector is given by one of 
the axioms of probabilities (see section Probabilities): 

{x, y) ^ fx(x) ■ fy(x) (7.584) 


P(T <t) = p(Jtf< tj= P{X < t\Y \ ) = j f x (x) ■ f Y (x)dxdy (7.585) 


where D = {(x,y)\x < t\y\}. This last integral becomes: 

+oo t-\y\ 

J fx(x) ■ f Y (x)dxdy = J J f x (x) ■ f Y {x)dxdy (7.586) 

D — oo — oo 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Let us make the following change of variables x = u\y\ in the inner integral. We obtain: 

+OO t 

t +oo 

P(T^t)= / / f x (u ■ \y\) ■ f Y (y)\y\dudy = fx(u ■ \y\) ■ f Y {y)\y\dydu (7.587) 

— OO — CXD 

— oo — OO 




f(t) = tP( t ^ t) = / fx(t ■ \y\) ■ f Y (y)\y\dy = — / e 2 \y\dy (7.588) 


2 tt 

Now the absolute value will be useful to write: 



f(t) = — / e " 2 \y\dy = ~— / e y 2 y dy + — / e 

2 vr 

2 tt 

2 tt 

-ydy (7.589) 

For the first integral we have: 



-L 1 _ V c ^ x ) , 

— e 2 ydy = 

y 2 (t 2 + 1) 

e 2 

f 2 + 1 


1 + 777 — — 7 - = 0 (7.590) 

f 2 + 1 f 2 + 1 

It remains therefore only the second integral and making the change of variable v = y 2 , we get: 


m = 

2 vr 

e 2 dv = — 

^(t 2 +i) 

e 4 2 J| 

vr(f 2 + 1) 


vr(f 2 + 1 ) 


What we will denote thereafter (to respect the notations adopted so far): 

P(X) = 2 

7r(x^ + 1 ) 


and that is just simply the so called Cauchy distribution. It is also a effectively a distribution 
function because it verifies (see section Differential and Integral Calculus): 



P(x)dx = — 

— — -dx = — (arctan(-foo) — arctan(— 00 )) = — [ — — [ — — ) ) = 1 

7T / X | X 7T 

1 / 7T 

n V2 


It is obvious that we get therefore for the cumulative distribution function: 

X X 

P(X < x) = J P(x)dx = \ j 

dx = — (arctan(x) — arctan(— 00 )) = 

X 2 + 1 7T 


= 1 (arctan(x) -(-£)) = -arctan(x) + 
7 r V \ 2 / / 7 r 

Here is plot example of the Cauchy distribution: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

The Cauchy distribution has for expected mean: 

x 1 

:dx = — 
7 r 

1 + x 2 

1 + X 2 

1 + X 2 

ix = j xP{x) dx = — j 

— OO — OO 

= J- ( ln (! + a;2 )|° oo + M 1 + ^C) - J-(~ ln (°°) + ln(oo)) 



= 0 

Caution!!!! The above calculations do not give zero in facts because the subtraction of infinite 
is not zero but indeterminate! The Cauchy distribution therefore and strictly speaking does not 
admits an expected mean! 

Thus, even if we can build a variance: 




a 2 = I (x — fi) 2 f(x)dx = / x 2 P(x)dx = 

7 T J 1 + X 2 

— OO 

x 2 1 

dx = — 

7 r 


1 _ TjT~2 1 dX 
1 + x z 

= — lim 

7f t— >-+oo 

1 - 

1 + X 1 

dx = 27 t lim (t — arctan(f)) = +cx) 

t — 


this is absurd and does not exist strictly speaking as the mean doesn’t exists...! 

The Cauchy distribution is used a lot in financial engineering as it is heavy tailed and therefore 
a very good candidate to be more accurate in predicting extreme values at the opposite to the 
Normal distribution that has the tails decreasing to quick. Further the Cauchy distribution is a 
heavy tailed law with a support on M when the Pareto distribution (also heavy tailed) is defined 
only on M + . 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

The Cauchy distribution if one of the most famous distribution function that... we cannot found 
in the spreadhsheet softwares like Microsoft Excel. To be able to get the closed form of the 
inverse Cauchy CDF we start from the CDF proven previously: 

P(X < x) — — arctan(x) + - 
7T 2 


and therefore if we let: 

1 / N 1 

y = — arctan(x) + - 

7 T 2 


We immediately get the inverse CDF: 

x = tan ( 7T ( y - - 


That is useful in finance as we know (see section Theoretical Computing) that to simulate a 
Cauchy variable when the use the inverse transforme sampling: 


tan [ tt [ U 

<[ 0 , 1 ] 


7.6.16 Beta Distribution 

Let us first recall that the Euler Gamma function is defined by the relation (see section Differ- 
ential And Integral Calculus): 


" / e~ x x z ~ 1 dx (7.601, 


We proved (see section Differential And Integral Calculus) that a non-trivial property of this 
function is: 

T(z + 1) = zT{z) 


Let us now write: 


r (a)T(6) 

R \im jj e-^x^yh-'dxdy 



Ar = {(x,y)\x >0,y>0,x + y<R} 


By the change of variables: 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

we get: 


T(a)T(b) = lim ff e-^x^y^dxdy = lim f e~ u 

R — ^-|-oo JJ R — ^-|-oo J 

a r o \ 0 

(u - v) a 1 v b I du 


For the internal integral we now use the substitution v — ut, 0 < t < 1 and therefore we find: 


r (a)r(6) = lim e~ u 

R — ^-)-oo / 

(u — v) a 1 v b 1 dw dw 



R r >. +“ (7.607) 

= Jim / e~ u u a+b ~ l du l{l-t) a -H b - 1 dt = B{a,b) / e-V+^Mu 




= B(a, b)T(a + 6) 

The function /i that appears in the expression above is named "beta function" and therefore we 


Now that we have defined what we name the beta function, consider the two parameters a > 
0, b > 0 and consider also the special relation below as the "beta distribution" or "beta law" 
(there are several formulations of the beta distribution and a very important one is studied in 
detail in the section of Quantitative Management): 



Pa,b ( X ) 


_ x a -\l-x) b -\ 
B(a,b) ]0,1[ 

B(a, b) - 






We first check that P a ,b(x) that is effectively a distribution function (without getting into too 
much details ...) 

+oo +oo +oo 

. f x a_1 (l — a ;) 6 ” 1 f x a_ 1 (l— x ) 6_1 

Pa,b{x)x = I ( n l]o,i[dx = / — rr l]o,i[dx 

£>(a, b) 


B(a, b) 


/ 7 / x^l - xf-'dx = _ 1 

B(a,b) J B(a,b ) 

Let us now calculate the expected mean: 


/i = / xP a)b (x)dx = 

B(a, b) 

x a {l-x) b ~ 1 dx 


B(a + l,b) r(a + l)r(6) T(a + 6) a 

B(a,b ) r(a + 6 + l) r(a)T(6) a + b 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

by using the relation: 

r(* + l) = zT{z) 


and its variance: 


(J 2 = (x- [l ) 2 f(x)x = 

(x — fi) 2 x a \l—x) b x dx 



x a+1 (l - x) b ~ l x - 2 fi I x a {l-x) b ~ 1 dx + ^ 2 I x^yi-xf^dx 

\b- 1, 


0 0 0 
B(a + 2, b) — 2 fi 2 B(a, b ) + fi 2 B(a + 1, b) B(a + 2 ,b) — fi 2 B(a , b) 

B(djb) B(a,b ) 


As we know that: 

E(* + 1) = *r(z) and B(a, b) = J 


we find: 

u ■ (a + 1) 

B{a + 2,b) = h \ ’ 

v ' a + 5 + 1 


and therefore: 

2 /r • (a + 1) 2 ab 

a + 5 + 1 ^ (a + &) 2 (a + b + 1) 


Examples of plots of the beta distribution function for (a, b) = (0.1, 0.5) in red, (a, b) = 
(0.3, 0.5) in green, (a, b) = (0.5, 0.5) in black, (a, b) = (0.8, 0.8) in blue, (a, b) = (1, 1) 
in magenta, (a, b) = (1,1.5) in cyan, (a, 6) = (1,2) in gray, (a, b) = (1.5,2) in turquoise, 

(a, b) = (2, 2) in yellow, (a, b) = (3, 3) in gold color: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Here is a plot example of the beta distribution and cumulative distribution for the parameters 

M) = (2,3): 

o o o o o o — — o © o o o o 

Figure 4.100 - Beta law (mass and cumulative distribution function) 

7.6.17 Gamma Distribution 

The Euler Gamma function being known, consider two parameters a > 0, A > 0 and let us 
define the "Gamma distribution" (or "Gamma law") as given by the relation (density function): 

a-l -Xx 

Pa,x(x) = ^ ^ 1 



x a ~'e~ Xx dx 


By the change of variables t = Xx we obtain: 





and we can then write the relation in a more conventional form that we find frequently in the 


and it is under this notation that we find this distribution function in Microsoft Excel 11.8346 
under the name GAMMAD 1ST ( ) and its inverse by GAMMAINV ( ). 

Let us now see a simple property of the Gamma distribution that will be partially useful for the 
study of the Welch statistical test . First recall that we have shown above that: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Vj/eR Ms) = ^(|) (7 - 621) 

Let us write Y = c te X, then we have immediately: 

(?) W. (7 - 622) 

So the multiplication by a constant of random variable that follows a Gamma distribution has 
only for effect of dividing the parameter A by the same constant. This is the reason why A is 
named "NewTermscale parameter". 

If a E N, the Gamma distribution at the denominator becomes (see section Differential And 
Integral Calculus) the factorial (a — 1)!. The Gamma function can then be written: 

P a , a — 

x a 1 X a e Xx 
(a - 1)! 

(?A) a 1 
(a - 1)! 

Xe~ Xx 


This particular notation of the Gamma distribution is named the "Erlang distribution" that we 
find naturally in the theory of queues and that is very important in practice! 

Then we check with a similar reasoning to this of the beta distribution that P 0) \(x) is a distribu- 
tion function: 


J P a ,\ dx = 1 

— OO 


Examples of plots of the beta distribution function for (a, A) = (0.5, 1) in red, (a, A) = (1, 1) 
in green, (a, A) = (2, 1) in black, (a, A) = (4, 2) in blue, (a, A) = (16, 8) in magenta: 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

and a plot example of the Gamma distribution and cumulative distribution for the parameters 

(a, 7 ) = (4,1): 

Figure 4.102 - Gamma law (mass and cumulative distribution function) 

The Gamma function has also for expected mean: 



— OO 

A a 



I ^ 

— OO 

A a T(a + 1) A a aT(a) 

r(a) A“+ x T(a) A“+! 




info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

and for variance: 


cr 2 = (x — y) 2 f(x)dx = 

x a 



(x - iY) 2 x a ~ 1 e~ Xx dx 



+oo +oo +oo 

x a+i e -\x x _ 2 n [ x a e~ Xx x + jJ 2 [ 

X a fT(a + 2) + 2 ^( a ) g -^( a + l) 

x a 1 e Xx dx 


A a + 2 

A a 

A a+1 


. , — r (r(o + 2) + a 2 r(a) — 2aT(a + 1)) 

A 1 (ci) 

' ((a + l)ar(a) + a 2 r(a) - 2a 2 r(a)) = ° 

A 2 r(a) 

A 2 

Let us now prove a property of the Gamma distribution that will permit us later in this chapter, 
during our study of the analysis of variance and confidence intervals based on small samples, 
another extremely important property of the Chi-square distribution. 

As we know, the distribution function of a random variable following a Gamma function of 
parameters a, X > 0 is: 

Pa,j(x) = f(x) = 

p \x ryt CL — 1 

E (a) 

A a l 



with (see section Differential And Integral Calculus) the Euler Gamma function: 


r(a) = [ e^x^dx 


Moreover, when a random variable follows a Gamma function we often notice it in the following 

X + Y = 7 (a, A) (7.629) 

Let X, Y be two independent variables. We will prove that if X = 7 (p, A) and Y = 
gamma(q, A), hence with the same scale parameter, then: 

X + Y = 7O + q, A) (7.630) 

We write / the density function of the pair X, Y, f(x) the density function of X and f Y the 
density function of Y. Because X and Y are independent, we have: 

f(x,y) = f x (x) ■ f Y (y ) (7.631) 

for all x, y > 0. 

Let Z — X + Y . The distribution function of Z is therefore: 

F(z) = P{Z ^ z) = P(X + Y ^ z) = II f(x,y)xy (7.632) 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

where D = {{x,y)\x + y < z}. 


As we already know we name such a calculation a "convolution" and statisticians often 
have to handle such entities because they work on many random variables that they have 
to sum or even to multiply. 

\ / 


+oo z—x 

F{z) = fx(x)f Y (y)dxdy 


— OO — OO 

We perform the following change of variable x = x,y = s — x. The Jacobian is therefore (see 
Differential And Integral Calculus): 

J = 

dx dx 
dx ds 


dx ds 

1 0 

-1 0 

= 1 

Therefore with the new integration limits s = x + y = x + (z — x) = z we have: 

+00 2 

Z +oo 

F{z) = / / fx(x)f Y (s - x)dsdx = fx(x)f Y (s - x)dxds 

— OO — OO 

— oo — OO 

If we denote by g the density function Z we have: 

z +oo 

F{z)= / / fx(x)f Y (s - x)dxds = / g(s)ds 

— oo — oo 

Then it follows: 






g(s) = j fx(x)f Y (s-x) dx 

— OO 

fx and f Y being null when the argument is negative, we can change the limits of integration: 


g(s) = [ fx(x)f Y (s - x) dx for s^O (7.638) 

Let us calculate g: 

9(s) = 

\ p+q e 


r (p)T(q) 

x p x (s — x) q 1 da; 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

After the change of variable x = st we obtain: 


A p + i e ~ Xs 


s P+q~l 

A p + i e ~ Xs 


s p+q - ] B{p,q) 


where B is the beta function we saw earlier in our study of the beta distribution. But we have 
also proved the relation: 


B(p , q ) 

r(p + g) 

B(p, q ) 

\P+q e ~ Xs 

T(p + q) 

s P+q~l 



More explicitly: 

\p+q p-Kx+y) 

9 (x,y) = ( x + y r 9 - 1 (7.643) 

r(p + g) 

Which finally gives us: 

S S 

/ r \p+q P ~ Xs 

g ( s )ds = / — is p+q - ] ds (7.644) 

J r (p + q) 

o o 

This shows that that if two random variables follow a Gamma distribution then their sum will 
also follow a Gamma distribution with parameters: 

X + Y = 7 (p + q, A) (7.645) 

So the Gamma distribution is stable by addition as are all distribution arising from the Gamma 
distribution that we will see below. 

7.6.18 Generalized Gamma Distribution 

The generalized gamma distribution is a continuous probability distribution with three param- 
eters. It is a generalization of the two-parameter gamma distribution. Since many distributions 
commonly used for parametric models in survival analysis (such as the Exponential distribu- 
tion, the Weibull distribution and the Gamma distribution, and lognormal) are special cases of 
the generalized gamma, it is sometimes used to determine which parametric model is appropri- 
ate for a given set of data. 

Therefore let us notice that if we write after trials and errors the following density function 
named "generalized Gamma law": 






info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

with x > 0, a > 0, r] > 0, k > 0. 

Then, for k = 1 we fall back on the density function of Weibull law (see section Industrial 
Engineering) that is with our own notations of the corresponding section is given by: 


fix) = 

ax al pa _ ^_ x OL-\ e \/3 


/3 j3 a ~ l f3 c 

For r] = 1, we fall back the Gamma density function just introduced before: 


fix) = 

— 1 g — \x 


\ a • l 



For k — 1 and // = 1, we fall back on the exponential distribution also seen previously: 

f{x) = \e Xx • l [0 ,+oo[ (7.649) 

and finally for rj — >■ 0, k — >■ +oo we fall back on a log-normal distribution after developing 
the limits using the Stirling, Hospital and Taylor techniques (see section Functional Analysis, 
Sequences and Series, Differential and Integral Calculus): 

fix) = — 7 H= e 2cj2 (7-650) 

ox\J h r 

As always, on request we can detail the developments! 

7.6.19 Chi-Square (Pearson) Distribution 

The "chi-square distribution" (also called "chi-square law" or "Pearson law") has a very impor- 
tant place in the industrial practice for some common hypothesis tests (see far below...) and is 
by definition only a particular case of the Gamma distribution in the case where a = k/2 and 
A = 1/2, when k is a positive integer: 




E __1 _ t 

■ 1 



This relation that connects the chi-square distribution with the Gamma distribution is important 
in the in Microsoft Excel 1 1 .8346 as the function CHIDIST ( ) returns the confidence level and 
not the distribution function. Then you must use the function GAMMADIST () with the parameters 
given above (except that you must take the inverse of 1/2: also 2 as parameter) to get the 
distribution and cumulative functions. 

The reader who wishes to check that the Chi-square distribution is only a special case of the 
Gamma distribution can write in Microsoft Excel 14.0.6123: 

=GAMMA .DIST (x,k, 1 ,TRUE) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

All calculations made previously still apply and we get immediately: 

fi = k, a 2 = 2k (7.652) 

Examples of plots of the chi-2 distribution function for k — 1 in red, k — 3 in green, in black, 
k = 4 in blue: 

and a plot example of the chi-2 distribution and cumulative distribution for the parameter k — 2: 

018 - 

Figure 4.104 - Chi-2 x 2 law (mass and cumulative distribution function) 

In the literature, it is traditional to write: 

X = xl or X = X 2 (k) (7.653) 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

to indicate that the distribution of the random variable X is a chi-square distribution. Further- 
more it is common to name the parameter k "degree of freedom" and abbreviate it "df". 

The y 2 distribution is therefore a special case of the Gamma distribution and by taking k = 2 
we also find the exponential distribution (see above) for A = 1/2: 


p 2{x) = . i [0i+oo[ = i e 2l [0+oo[ (7.654) 

Moreover, since (see section Differential And Integral Calculus): 

r Q) = (7.655) 

the x 2 distribution with k equal to unity can be written as: 

p i ( x ) = 2 ^T(l/2) xl/21e ~ X/2 ' 1[0 ’ +oo[ = ^/h^ e ~ X/2 ' 1[ °’ +oo[ (7 ' 656) 

Finally, let us finish with a fairly large property in the field of statistical tests that we will 
investigate a little further and particularly for confidence intervals of rare events and the famous 
Fisher methode for multiple p - value test. Indeed, the reader can check in a spreadsheet software 
like Microsoft Excel 14.0.6123 that we have: 

=P0ISS0N .DIST (xeN, /i,TRUE) 
=1-CHISQ.DIST( 2/r, 2(x + 1),TRUE) 
=1-GAMMA .DIST ( 2/r , x + 1,TRUE) 
=1-EXP0M .DIST ( x, 0.5, TRUE) 

So we need to prove this relation between law y 2 and Poisson distributions. See it starting from 
the Gamma distribution: 

/ytCL lp AX 

P a, X(X) = A°1 




If we write A = 1/2 and a = k/2 then we have the y 2 distribution with k degrees of freedom: 


p k(x) = 

2 k / 2 T(k/2) 

x k/2 l g ^/ 2 l ]0)+oo[ 


Now remember that we have seen in the section Sequences And Series, the following Taylor 
(Maclaurin) serie from order n — 1 around 0 to A with integral rest: 

n— 1 \k r 

e n (t) 

(A — 1) 

n— 1 

„ A 

n— 1 \k r 

dt ^¥ + 

k = n ^ • J 

n— 1 \ k 

= y — 

u=A - t fro k] 




n— 1 \ k 


k = 0 

n— 1 \k 


k = 0 

(n — 1)! 

n— 1 \k r 

(n — 1)! 

„A —u 


n — 1 

(n — 1)! 




info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

We multiply by e A : 

e A e- A = e" A 

And therefore: 

n— 1 \k 

£« + 



n— 1 

71—1 \k 

(n — 1)! 

dM | ^ i = ^ _ e A + 




(n — 1)! 



n— 1 \fc /> 




(n — 1)! 



Now, let us focus on the term: 



(n — 1)! 



and make a first change of variable: 

f e~ u UU 1 d u = [ -x n - l e- x/2 -dx = I — T 

/ (n — 1)! u=x/ 2 J 2 n_1 (n — 1)! 2 J 2 n (n — 1)! 

x n-l e - x /2 d x 


and a second change of variable (caution! the k in the change of variable is not the same as this 
in the Poisson sum...): 

2 n (n - 1)! 

x n - l e- x/2 dx = 


o 2 fc /2 ( | - 1 ) ! 


k /2-l e -x/2 d . 



However, we have shown in the section of Differential And Integral Calculus that if x is a 
positive integer: 

x\ = r(ic + 1)! 


Then it comes: 

Finally we have: 


» 2 ‘ /2 U r 

where we find out the chi-2 distribution under the integral! So at the end: 


k/2-l e -x/2 dx 


n — 1 \ k 

E 7T e_A = 1 

^ jfc! 

k = 0 

2*/ 2 ( | I ! 

-x k/2 ~ l e- x/2 dx 


This explains the formulas given above for the spreadsheet software. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

7.6.20 Student Distribution 

The "Student distribution" (or "Student’s law") of parameter k is defined by the relation: 


with k being the degree of freedom of the x 2 distribution underlying the construction of the 

Student function as we will see. 

Let us indicate that this distribution can also be obtained in Microsoft Excel 1 1.8346 using the 
TDIST ( ) function and its inverse by TINV () . 

It is indeed a distribution function because it also satisfies (remains to be proved directly, but as 
we will see it is the product of two distribution functions thus indirectly...): 


— OO 

Let us see the easiest proof to justify the provenance of the Student distribution and that will 
also be very useful further in statistical inference and analysis of variance. 


1. If A" and Y are two independent random variables with respective densities fx ■ fy, the 
distribution of the pair (A", Y) has a density / satisfying (axiom of probabilities!) 

f(x,y) = fx{x)fv(y) 


2. The distribution J\f( 0, 1) is given by (see above): 



for and y > 0 and n > 1. 

4. The function T is defined for all a > 0 by (see section Differential and Integral Calculus): 




info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

and satisfies (see section Differential and Integral Calculus): 

r(a - 1) 


a — 1 


for a > 2. 

These reminders made, now consider a random variable X that follows the distribution Af(0, 1) 
and Y a random variable following the distribution x'i- 

We assume X and Y being independent and we consider the random variable (this is at the 
origin the historical study of the Student distribution in the framework of statistical inference 
which led to define this variable for which we will deepen the origin later): 

„ r- x x •V(°> 1) 

1 — v Tl — — , — — - 

\/yJn \/xi/n 


We will prove that T follows a Student distribution of parameter n. 

Proof 4.33.4. Let F and / and be respectively the repartition and density functions and T, 
fx ■ fv the density functions of A", Y and (A", Y) respectively. Then we have for all t e M: 

F(t) = P(T ^ t) = P <tj = jj f (x, y)dxdy = jj f x (x)f Y (y)dxdy (7.679) 

D D 


D = | (x,y) elx M*\x < 


the imposed positive and non-zero value and y being due to the fact that it is under a root and 
furthermore at the denominator. Thus: 

+oo ty/y/ V™ +oo 

F(t) = JJ f x (x)f Y (y)dxdy = J f Y (y)dy j f x (x) dx = j f Y (y)d>(t^/y/y/n)dy 

D 0 — oo 0 


where because X follows a Normal JV(0, 1) distribution. 



— OO 

is the Normal centered reduced cumulative distribution. Thus, we obtain the density distribution 
function of T by deriving F: 


f(t) = F'(t) = J f Y {y)fx {t sjyj Vn) Vijdy (7.683) 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

because (the derivative of a function is equal to its derivatives multiplied by its inner derivative): 


dr yjn 



f(t) = -i= fY(y)fx(t^y/Vn)y/ydy 



I p-yl 2 n, n l 2 ~1 1 r -(U/y/Vn ) 2 / 2 

nj 2 n / 2 T(n/2) 6 V 




n 2 n / 2 T(n/2) v^tF 

e - y /2 2/ n/2-l e -(t^/v / H) 2 /2 v ^ d?/ 



2 n / 2 r(n/2)v / 2vrn 

7'( 1 + t / n ) n /2-l i 

e 2 y d y 

By making the change of variable: 

2 / /, 

m = - i + - 
2 V n 

we get: 

dy = 

f 2 

1 H 


d u 

d y = 

m = 

2 n / 2 T(n/2)V^m \1 + t 2 /n 

n + 1 



e -« M (n-l)/2 du 


n + 1 

2 n / 2 r(n/2)A/27r v / n \1 + t 2 /n 



0 . _n+i 

7-2 \ 2 

1 + 

r(n/2)-)/7m V n, 

what is the Student distribution of parameter n. 




□ Q.E.D. 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Let us now prove what is the mean of the Student distribution: 


T = Jn 


We have: 

E(T) = E(^/ViX)E 


1 \ 

But E ( —j= J exists if and only if n > 2. Effectively for n = 1: 


e, 7f) = 7^ = 21^72)/ 

o o 

V n - 3 

e 2 y 2 dy 


+oo +oo 1 +oo 

f f e -2 f e _ 2 _1 (1 

/ e 2 y 2 ay = / dy ^ / dy ^ e 2 / -dy — >■ +oo 

./ .7 y J y J y 

0 0 0 0 

Whereas for n > 2 we have: 



y n — 3 n — 1 / n — 3 n — 1 f Tl — 1 

e 2j; 2 d?/ = 2 2 e u 2 du = 2 2 r[ — - — ) < +00 

0 0 

Thus, for n — 1 the mean does not exist. 

So for n > 2: 

E(T) J^SEpOE = 0 


Now let us see the value of the variance. So we have: 

V(T) = E(T 2 ) — E(T) 2 

First we will discuss the existence of E(T 2 ). We have trivially: 

E (T 2 ) = nE (X) = nE(X 2 )E (A 

X follows a Normal centered reduced distribution thus: 

V(X) = 1 = E(X 2 ) - E(X) 2 = E(X 2 ) => E(X 2 ) = 1 


With regard to E ( — ) we have: 




E|d) = f = 9 "n? /9 n [ e~ 2 yt~ 2 dy = „ 1 f e~ u u^~ 2 du 

y ) J y 22 T(n/ 2 ) ./ 2a T(n/2) ,/ 

0 0 0 










rd- 1 ) 




info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

where we made the change of variable u = y/2. But the integral defining T ( — — ij converges 

only if n > 3. Therefore E(T 2 ) exists if and only if n > 3 so it’s value is according to 
the properties of the Euler Gamma function demonstrated in the chapter of Differential And 
Integral Calculus: 

nE (T) 

r (8 — i) 

77 — — 



n — 2 

Therefore for n > 3: 

V(T) = 

n — 2 

It is also important to note that this law is symmetrical about 0 ! 



Plot example of the Student distribution and cumulative distribution for the parameter k = 3: 

Figure 4.105 - Student T law (mass and cumulative distribution function) 

7.6.21 Fisher Distribution 

The "Fisher distribution" (or "Fisher-Snedecor distribution") of parameters k and 1 is defined by 
the relation: 


if x > 0. The parameters k and l are positive integers and correspond to the two degrees of 
freedom of the underlying chi-square distributions. This distribution is often denoted by F^i or 
by F(k, l ) and can be obtained in Microsoft Excel 11.8346 with FDIST( ) distribution. It is 
indeed a distribution function because it satisfies the property: 


F ky i(x)dx = 1 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Let us see the easiest proof to justify the provenance of the Fisher distribution and that we will 
be us also very useful further in statistical inference and analysis of variance. 

For this proof, recall that: 

1. The distribution xf is given by (see above): 

= 2 ^ rW "" V/2 ' 1 <7 ' 703) 

for y > 0 and n > 1. 

2. The Euler Gamma function T is defined for all a > 0 by (see section Differential and 
Integral Calculus): 





Let X , Y be two independent random variables following respectively the distributions 

Xl and x 2 m - 

We consider the random variable: 

T= Yl = 2^1 (7.705) 

Y/m Xm/m 

We will prove that the distribution of T is the Fisher-Snedecor distribution of parameters n, m. 

Let us note for this purpose F and / the distribution and cumulative distribution function of T 
and fx, f Y , f density functions of A", Y and respectively (A", Y). We have for all t e M: 

F(t) = P{T^t) = P < tj = If f(x,y)dxdy = fj f x (x)f Y (y)dxdy (7.706) 

D D 


f Tit 1 

D= \(x,y) eR* xR*\x < — y\ (7.707) 

t m ' 

where the imposed positive values comes in fact that behind them there is a chi-square for x and 
y. Therefore: 

+00 m 

F(t) = [ fr(y)dy J f x (x)dy (7.708) 

o o 

We obtain the density function of T by deriving F. First the inner derivative: 


fit) = F\t) = — f f Y (y)f x (—y) ydy (7.709) 

in J v m ' 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Then explicitly because: 

f (y) = 2 ™/2 and 

2 m / 2 r(rn/2) 

we then have: 


/« = = f 

nt \ 1 (nt \ n / 2 1 

— v = g 2m y ( 

m ' 2 "/ 2 r(n/ 2 ) Vm 

m ,/ 2 m / 2 r(m/ 2 ) 


— w/2 — i 1 « / tii \ 2 

e. ' y 2 — — ; — — e 2m l — y j 2/ck/ 

° ^ hn, / 

2 a T(n/ 2 ) 

+°o „ 

1 — f e -y/ 2 y™/ 2 ~ 1 e -^y (— y] 2 ydy 

2 ”/ 2 r(n/ 2 ) 2 m / 2 r(m/ 2 ) m J 6 \m y ) V U 


2 n / 2 r(n/ 2 ) 2 m / 2 r(m/ 2 ) m 


” [ e-^v^e-& V />-'ydy 


2^r(n/2)r(m/2)m Vm 

By making the change of variable: 


. n_1 /» 

Tl ( Tit \ 2 / n+m -I _ 2 / / 1 | wt\ 

- — / y 2 e n i+ mJdj 

y / nt \ 2 m 

«= |( 1 + -) ^2/= v- 
2 v m' 1 + 



d y = dw 

1 + 


we get: 




/(*) = 

, n i 

/nt\2 1 

n+m -(-OO 

2 r 


2^r(n/2)r(m/2) W V 1 + f . 

u 2 *e “dw 

n ~p / n+m 
m 1 V 2 

r(n/2)r(m/2) V 777, 7 

nt\ 2 



i + 2 

m ' 

n \ 2 p / n+m 

f2- 1 (l+- 



nt \ 2 

r ( 2 +) 



7.6.22 General Folded Normal Distribution 

The "folded Normal distribution" is the distribution of the absolute value of a random variable 
with a Normal distribution 1 . As we have mentioned before, the Normal distribution is perhaps 
the most important in probability and is used to model an incredible variety of random phe- 
nomena. Since one may only be interested in the magnitude of a Normally distributed variable, 
the folded Normal arises in a very natural way especially in Finance and Industrial Engineering 
(Design of Experiments). The name stems from the fact that the probability measure of the 
Normal distribution on (— 00 , 0] is folded over to [0, 00 ). 

'The majority of the text below comes from http : //www.math.uah. edu/stat/ 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Definitions (#96): Suppose that X has a Normal distribution with mean /x e I and standard 
deviation a E (0, +oo). Then V' = |A"| has the fold normal distribution with parameters // and 

Suppose that Z follows the standard Normal distribution. Let us recall that then Z has proba- 
bility density function cj) and distribution function $ given by: 

= wz " 2 

z z 

— oo — oo 


with x6l. 

If y E K. and a E [0, +oo[, then X — y + oZ has the Normal distribution with mean y and 
standard deviation a, and therefore it is obvious at this level that: 

Y =\X\ = \y + aZ\ (7.715) 

has the folded Normal distribution with parameters y and a. 

Now let us determine the cumulateded probability CDF function of such a variable! For y E 

[0, +oo[: 

F(y) = P(Y < y) = P( \X\ < y) = P(\y + aZ\ < y) = P(-y < y + aZ <y) 

= P 

a a 

Since <&(— Z) = 1 — $(Z) we have: 

= $ 

y- y 


-y - y 



F{y) = $ ( 

y- y ' 

a - 

- <f> 



1 / X + /i 

= $ 



2 V a 



1 / x — fl 

+ <!> 

y + y 


- 1 

+ e 

2 V a 



We cannot compute the quantile function F 1 in closed form, but values of this function can be 

It comes therefore immediately that Y has probability density function / given by: 

1 { x + l f x — y 

2\~) +e^\~ 

This follow from differentiating the CDF with respect to y as we know! 


Now as always in this book we will focus only in what we need for the applications in the 
other chapters! So as we don’t need the moments of the folder Normal distribution we will not 
calculate them. The only purpose of the above development were to build the tools to be able 
to introduce a special case of the folder Normal distribution. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic Half-normal distribution 

In probability theory and statistics, the "half-normal distribution" is a special case of the 
folded Normal distribution. 

Let X follow an ordinary Normal distribution, A/"(0, a 2 ), then Y = X follows a half-Normal 
distribution. Thus, the half-normal distribution is a fold at the mean of an ordinary Normal 
distribution with mean fi = 0. 

Thus, let Y = \oZ\ = a\Z\ where Z has a standard Normal distribution and a G [0, +oo[. 
Clearly a is a scale parameter, unlike the case for the general folded Normal distribution. The 
distribution of Y when a — 1, Y — \Z\ has the "standard half-normal distribution". 

As the half-Normal distribution is just a special case of the folded Normal distribution with 
fi = 0 it comes immediately: 


with y G M + and sometimes denotes HM{ 0, a 2 ). 


Now what interest us for the others chapters of this book (especially Design of Experiments) 
are the moments of that latter! 

To calculate them, first remember that: 

m = -y==e - ^ 2 (7.720) 

V 2 7T 

with zeK. Therefore it is immediate that: 

4>'(z) = — z(f>(z ) (7.721) 

Also remember that we have already proved that: 

E (Z) = 0 E(Z 2 ) = V(Z) = 1 (7.722) 

We first need to determiner the moments for the Normal distribution! So for n G N+: 




E (Z n+1 ) = I z n+i (j)(z)dz = I z n z(t){z)dz = - I z n cf) , (z)dz 



Now we integrate by parts (see section Differential and Integral Calculus), with u = z n and 
du = 4>'(z)dz to get: 

E (Z n+1 ) = - z n (f){z ) 

Therefore for n G N, with n > 1: 



+ / nz n ~ 1 (j)(z)dz = 0 + 7iE(Z n - 1 ) 


E (Z n+1 ) = nE(Z n ~ 1 ) (7.725) 

The moments of the standard normal distribution are now easy to compute. First we know that: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

• E (Z) = 0 

• E (Z 2 ) = 1 


E(Z) = 

= 0 

E(Z 2 ) 

= 1 

E(Z 3 ) 

= E (Z 2+1 ) = 

- 2 

■EiZ 2 - 1 ) 

= 2 

■ E(Z) = 

= 0 

E(Z 4 ) 

= E (Z 3+1 ) = 

= 3 

■E^Z 3 - 1 ) 

= 3 

■ E(Z 2 ) 

= 3 • 

• 1 = 

= 1-3 = 3 

E(Z 5 ) 

= E (Z 4+1 ) = 

= 4 

■EiZ 4 - 1 ) 

= 4 

■ E(Z 3 ) 

= 4 • 

■ 0 = 

= 0 

E(Z 6 ) 

= E (Z 5+1 ) = 

= 5 

■E(Z 3 ~ l ) 

= 5 

■ E(Z“) 

= 5 • 

•3 = 

= 1-3-5 = 15 

E(Z 7 ) 

= E (Z 6+1 ) = 

= 6 

■E(Z & ~ 1 ) 

= 6 

' E(Z 5 ) 

= 6 • 

■ 0 = 

= 0 

E(Z S ) 

= E (Z 7+1 ) = 

= 7 

■E(Z 7 ~ l ) 

= 7 

■ E(Z fi ) 

= 7' 

• 15 

= 1-3-5-7 

Therefore we see that for the odd powers, that is to say Z 2n+1 with n G N then: 

E (Z 2n+1 ) = 0 

and for even powers: 

(2 n)\ 

E (Z 2n ) = 1 • 3 • . . . • (2n - 1) = 

n\2 r 

It follows for X = A/"(0, a) (just check with the special case of n = 0 and n = 1): 

E( X 2n+i) = 0 E (x 2n ) = a 

2 n\ 2 nft 71 )' 

n\2 r 




The moments of the half-normal distribution can now be computed explicitly. 

First it should be quite obvious by construction that the even order moments of Y are the same 
as the even order moments of oZ (in both case the values are all positive and therefore equal). 

E (Y 2n ) 

n\ 2 n 

For the odd order moments we must use (see above): 




2cr 2 




info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

with x G M + . Therefore by definition: 

+oo +oo y 

£(y2n+l) _ f v 2n +lf{v)Ay= f y 2„+l I 2^ d j, 

CT V 7T 




E (y 2 "+ 1 ) = i^ f y^e'^dy 


Now we make the change of variable u = y 2 / (2a 2 ), therefore first we have (this is obvious): 




2 a 2 = e - 

2 t/dy a 2 

dw = 3 - <G> d?/ = — dw 

2 a 2 y 

Therefore we have so far: 

. +00 +00 

1 y 2n+1 l e ~ Udu = a ^l I (y 2 T e ~ Udu 




So finally: 

. +00 +00 

E (Y 2n+1 ) = a^ j (2a 2 u) n e~ u du = a 2n+1 2 n ^ j u n e~ u du 


We recognize in this expression the Gamma Euler function integral (see section Differential and 
Integral Calculus)! Therefore is is immediate that: 

E(r 2n+1 ) = a^T 0 (u) = a 2n+1 2 n ^n\ 


So as summary: 

r2n\ 2 n (^ n ) 

E(r 2n ) = a 1 

n\2 r 

E(F 2n+1 ) = a 2n+1 2 n \ —n\ 


So finally we get the result we need for some properties of the Brownian motion in finance: 

E(y) =E(y 2 ' 0+1 ) = a \l ~ 

V(V) = E(Y" J ) - (E(K )) 2 = E(Y' J I ) - (E(K )) 2 = a‘ - | ‘ ( 1 - j 


But still one property is missing and now for our needs in the section of Industrial Engineering 
(Design of Experiments): the value of the Median! 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

So let M e denote the median of the half-normal distribution. Then by definition if follow: 

M e M e 1 / 1J 

0.50 = F(M e ) = j f{y)dy = ^ f e 

d y = 


Substituting y/{y/2a) = u we have 



0.50 = F[M e ) = — / e~ u d u 

' 7T ./ 


We recognize here the Error function (see section Thermodynamics). Therefore: 

0.50 = F(M e ) = erf 

M P 

\/ 2 < 




M e 

err 1 (0.50) = 




A spreadsheet software like Microsoft Excel give us for the complementary error function: 

=ERFC (0.5) =0.479500122 


M e = 0.479500122y / 2cr ^ 0.67a 

(' 7.744 ) 

The technique that we will see in the section Industrial Engineering makes the approximation 
that therefore: 

1.5 • M e ^a 


indeed... it’s engineering... 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

7.6.23 Benford Distribution 

This distribution was discovered first in 1881 by Simon Newcomb, an American astronomer, 
after he saw that the wear (and so the use) of the preferred first pages of logarithms tables (at 
this time there we compiled into books). Frank Benford, around 1938 remarked at his turn this 
unequal wear, believing he was the first to formulate this law that unduly bears his name today 
and arrived at the same results after having listed tens of thousands of data (lengths of rivers , 
stock quotes, etc.). 

There is also one possible explanation: we need more often to extract the logarithm of numbers 
starting with 1 that numbers starting with 9, implying that the first are in "bigger quantity" than 
the second one. 

Although this idea may seem to him quite implausible, Benford began to test his hypothesis. 
Nothing more simple: he study tables of numerical values and calculates the percentage of 
occurrence of the left-most digit (first decimal). The results obtained confirm his intuition: 
From these data, Benford found experimentally that the cumulative probability of a number 

First position number 1 2 3456789 

Apparition probability (%) 30.1 17.6 12.5 9.7 7.9 6.7 5.8 5.1 4.6 

Table 4.27 - Occurrence of a digit following the Benford distribution 

beginning with the digit n (except 0) is (we will prove this later) is given by the relation: 


named "Benford distribution" (or "Benford law"). 

Here is a Maple plot of the previous function: 



0 . 8 - 

0 . 7 - 

0 . 6 - 


0 . 4 - 

0 . 3 - 







Figure 4.106 - Plot of the Benford function (cumulative distribution function) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

It should be noted that this distribution applies only to lists of values that are "natural", that is to 
say numbers with physical meaning. It obviously does not work on a list of numbers randomly 

The Benford distribution has been tested on all kinds of tables: length of the rivers of the world, 
country area, election results, price list of grocery store ... It is true almost every time. 

The distribution is said to to be independent of the selected unit. If we take for example a 
supermarket price list , it also works well with the costs expressed in dollars as with the same 
costs converted into Euros. 

This strange phenomenon remained unexplained and little studied until quite recently. Then a 
general proof was given in 1996, which uses the central limit theorem. 

As surprising as it may seem, this distribution has found application: it is said that the IRS use 
it to detect false statements. The principle is based on the restriction seen above: Benford’s 
distributions applies only to values with physical meaning. 

Thus, if there is a universal probability distribution P(n) on such numbers, they should be 
invariant under scaling such that: 



P(kn ) = f(k)P(ri) 

P{n)dn = 1 

P(kn)dn = — 

and the normalization of the distribution gives: 

m = * 

If we derivate P(kn) = f(k)P(n ) with respect to k we obtain: 

-^-P(kn) = ^r\p{ n ) =>• nP'(kn) = P{ n )~Trj => nP\kn) 

QrC Q /C rC QrC rC 

choosing k = 1 we have: 

nP'{n ) = — P{n ) 

This differential equation has for solution: 

' P(n) F 







P(n ) = - (7.753) 


This function is not strictly speaking a distribution function (it diverges) and secondly, the 
physics and human laws impose limits. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

So we have to compare this distribution with respect to an arbitrary reference. Thus, if the 
decimal number studied contains power of 10 (10 in total: 0, 1, 2, 3, 4, 5, 6, 7, 9) the probability 
that the first nonzero digit (decimal) is D is also given by: 



Pd = 




The limits of the integral are from 1 to 10 because the null value is prohibited. 

The integral in the denominator gives: 

i i 

I 0P(n)dn = I 0 — dn = ln(10) — ln(l) = ln(10) 
o o 

The integral in the numerator gives: 

D+l D+l 


Pn = 

ln|;5 zr) ln ( 1 + ^ 

ln(10) ln(10) 

By the properties of logarithms (see section Functional Analysis) we have: 


j P(ri)dn = J — dn = In (D + 1) — In (D) = In (7.756) 

D D 


Pd = log 10 ^1 + 


However, the Benford’s distribution applies not only to non-scaling data but also to numbers 
from any sources. Explain this case involves a more rigorous investigation using the central 
limit theorem. This demonstration was conducted only in 1996 by T. Hill by an approach using 
the distribution of distributions. 

To summarize an important part of everything we’ve seen so far, the picture below is very 
useful because it summarizes the relation between 76 most common univariate distributions 76 
(57 continuous and 19 discrete): 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

Dlecrcte ualforin(a, b) 

8 = 0 
b = n 

Rectangular (n) 

Dela-blnonual(u, 6, n) 

b nj 



NcyUlve hypcrgcoiocttk(f»i , n-i. nj) 

• c 



Wl ^ Hyporgeometrlefni , nj.nj) 

> = »j> 

' " "rt 1 “ Koty»(n, p, />) 

y I . '.i- : *v p«n,/»,.n = »»,.«, -oo 

.'I-.* Btoo^K«.p)^ «■! , BmouUKp) 

<J ~ 

Noe central beta(/J, f, i) 

; Noncen'.ral t(n,£) ^ F(ni, 

L J ^ 4 

rx.«u: y nooeentral t(n, 

Xonrentral F(ni,»a,4) > ‘ IDB{4, *, y) ^ 

XX 7 ~ ‘Vft py — /t 

Doubljr aoncnntnJ F(ni,na,4,7) Ray lei*] 

L J v «- s ' 


C: Convolution 

L -a C 

P: Forgetful new 

L * 8 

L: Linear combination 

\f: Minimum 

F => R 

P: Product 

He tat Iona hi pa: 

R: Kn.dual 

>- Special cam 

S: Scaling 

TVanafor mat lona 

V: Variate generation 

>■ Limiting 

X: Maximum 


>.« ■ *f) 

fl4|Me(A,H]^ (TMen*ular(a, k, C KfltBocom-SnifM^n) 

J l * v J l V J l vA J 

Figure 4.107 - Relations between distributions (Source: AMS Lawrence M. Leemis and Jacquelyn T. McQueston) 

7.7 Likelihood Estimators 

What follows is of extreme importance in the field of statistics and is used widely in practice. 
It is important therefore to pay attention! Besides the fact that we will use this technique in 
this chapter, we shall find it in the chapter of Numerical Methods for advanced and generalized 
linear regression and also in the chapter of Industrial Engineering in the context of parametric 
estimation of reliability. 

We assume that we have observations xi,x 2 ,x 3 , ...,x n which are realizations of unbiased 
independent random variables (in the sense that they are randomly selected from a batch) 
X 3 ,X 2 , X : > , ..., X n of a unknown probability distribution but having the same one. 

Suppose we proceed by trial and error to estimate the unknown probability distribution P. One 
way to proceed is to ask if the observations x\, x 2 , x 3 , ..., x n had a high probability to get out or 
not with this arbitrary probability distribution P. 

We need for this to calculate the joint probability that the observations x\, x 2 , x 3 , ...,x n had to 
get out with the probabilities p 1 ,p 2 ,p 3 , ...,p n . This joint probability is equal to (see section 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


1] P{Xi = Xi ) (7.759) 


noting by the letter P the assumed probability distribution associated to Pi,P 2 ,P 3 , ■■■, Pn • You 
must admit that it would be particularly awkward, at the intuition level of risk, to choose a 
probability distribution (with its parameters!) that minimizes this quantity... 

Instead, we will seek the probabilities pi, P 2 , P 3 , • ••, Pn (or the associated parameters of the prob- 
ability distribution) that maximizes JJf=i -P(Aj = that is to say, that makes the observations 
xi,x 2 ,x 3 , ...,x n the most likely possible. 

This leads us to seek the parameter(s) 6 that maximizes the quantity: 


L n {6) = \{Pg{X i = x i ) (7.760) 


and where the parameter 6 is often in undergraduate school level problems a first order moment 
(mean) or second order moment (variance). 

The quantity L is named "likelihood". It is a function of the parameter(s) 6 and observations 6. 

The value(s) of the parameter(s) 6 that maximize the likelihood L n {6 ) are called "maximum 
likelihood estimators" (MLE estimators). 

In the very special case but useful of the Normal distribution, one of the parameters 6 will 
be the variance (see a little further concrete example) and can be considered intuitive to the 
physicist that to maximize the probability, the standard deviation should be as small as possible 
(so that the maximum numbers of events are in the same interval). Thus, when we calculate an 
MLE which is the smallest among several possible, then we are talking about a UMV estimator 
for "Uniform Minimum Variance Unbiased" because their own variance should be as small as 
possible. This can be demonstrated (but the proof is not very elegant) using the definition of 
the Fisher Information and the Frechet theorem (or Rao-Cramer) that makes use of the Cauchy- 
Schwartz inequality (see section Vector Calculus) and the analogy between mean and scalar 
product ... This demonstration will not be in this book. 

Let us still do five small examples (very classic, useful and important in the industry) with in 
order of importance (i.e. not necessarily in order of ease...) the distribution function of Gauss- 
Laplace (Normal distribution), the Poisson distribution, the binomial distribution (and so the 
Geometric distribution), the Weibull distribution and finally the Gamma distribution. 


These five examples are important as used in SPC (statistical process control) in various 
international companies around the world (see section Industrial Engineering). 

V J 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

7.7.1 Normal Distribution MLE 

Let be x\, ..., x n an n-sample of identically distributed random variables assumed to follow 

a Gaussian-Laplace (Normal) distribution of parameters /i and a 2 . 

We are looking what are the values of the maximum likelihood estimators 9 that maximize the 
l ik elihood L n (9 ) of the Normal distribution? 

We have prove earlier above that the density of a Gaussian random variable was given by: 

x (x - /r) 2 

P(x, n,n) = — 2cr 2 (7.762) 


The likelihood is then given by: 

- i -Jfe5>i-A0 s 

L(n,a) = Y[P(xi,n,fJ>) = i=1 


cr n \/ 27 r 


Maximize a function or maximize its logarithm is equivalent therefore the "log-likelihood" will 

n i n 

ln(L(/i, a)) = -- ln(2vr) - n ln(cr) - ^ - ^i) 2 


To determine the two estimators of the Normal distribution, first let us fix the standard deviation. 
To do this, we derive In (L(/x, cr)) over n and look for what the average value of the function is 
equal to zero. 

It remains after simplification the following term that is equal to zero: 

E(^-a0 2 ( 7 - 765 ) 


Thus, the maximum likelihood estimator of the expected mean of the Normal distribution is 
after rearrangement: 

and we see that it is simply the arithmetic mean (or also named "sample mean"). 


Let us now fix the mean. The cancellation In (L(/x, sigma)) of the derivative over a leads us to: 


d , . , , n 1 JP, . 9 

— ln(L(/qo-)) = -^2 - =0 




info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

This allows us to write the maximum likelihood estimator for the standard deviation (the vari- 
ance when the mean is known under the an assumed distribution also supposed known!): 


that some people also name "Pearson standard deviation"... 

Even if it is a little bit redundant some people as us to show the proof of the estimator of the 
covariance matrix (and therefore the correlation matrix). 

Remember that we have prove earlier that for the bivariate case we have: 

1 — — E 

'<*■*’> = 2 

In fact the relation is the same for the multivariate case with T! 

The log-likelihood is therefore immediate by analogy with the univariate case: 



yi'T' T 1 1 

In (LQ7, £)) = — — ln(27r) - - ln(£) - - ^(x* - ^) T S' 1 (x i - /i) 


That we can also write (as £ is diagonal and £ 1 also): 

vT T I T 

In (L(/2,£)) = — — ln(27r) - - ln(£) - -^tr ((x* - /j) T £ -1 (xi - //)) 

nT T 1 T 

= — — ln(2vr) - - ln(£) - - tr ( E_1 ( x i ~ v) T ( x i ~ ») ) 

r f 1 T 1 

lu(2 ! r)--ln(E)--tr(E- 1 S) 

Where by definition and using the estimator of the mean: 


Then we deduce that: 

and we get finally: 

S = J2( X i - fr) T { X i - A) 


d T ln (l(A,E)) =7 e-1e = 0 




£ = -S 

However, we have not yet defined what is a good estimator! What we mean here is: 


If the mean of an estimator is equal to itself, we say that this estimator is "unbiased" and 
that’s obviously what we want! 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

• If the mean of an estimator is not equal to itself, then we say that this estimator is "biased" 
and is necessarily less good... 

In the previous example, the average is unbiased (this is trivial as the average of the arithmetic 
mean is equal to itself). But what about the variance (verbatim the standard deviation)? 

A simple little calculation by linearity of the mean (since the random variables are identically 
distributed) will give us the answer in the case where the theoretical average (mean) is approxi- 
mated as in practice (industry) by the estimator of the mean (most common case). 

So we have for the calculation of the mean of the "sample variance": 

E (ff 2 ) = E(Vpf)) = E (1 £>i - m) 2 ) = E (1 f> 2 - 2xiH + ? 

i = 1 

l 71 


i = 1 

= E - Z) - 2 ~ H A 2 ) = E - H x2 i - 2 A- + ~ n A 

n U i 


n T= i 

n T= i 

= E ( ~ 1 4 - 2/i 2 + a 2 ) ) = E (- 1 x? - 2£) ) = -± E(x 2 ) - E(p, 2 ) 




i = 1 




However, as the variables are supposed to be identically distributed: 

i n I n 

E (^ 2 ) = -J2 E (^ 2 ) - E (A 2 ) = - E E ( x2 ) - E (A 2 ) = E ( x2 ) - E (A 2 ) 


i = 1 




And as we have (Huyghens theorem): 

V(X) = E(X 2 ) - E(X 2 ) 

V(A) = E (A 2 ) - E (A ) 2 = E(A 2 ) - E PQ 2 



wherein the second relation can be written only because we use the maximum likelihood esti- 
mator of the average (empirical average). Therefore combining the two above relations with the 
prior-previous one we get: 

E(d 2 ) = E(z 2 ) - E (/i 2 ) = (V(X) + E(X) 2 ) - (Y(/t) + E(X) 2 ) = V(X) - V(/2) (7.779) 
and as: 


V(X) = cr 2 and V(/t) = — (7.780) 


Finally we have: 

E(<j 2 ) = a 2 - — = ( 1 - -) a 2 = — -a 2 (7.781) 

n \ n J n 

so we have a bias of at least one standard error: 

(T 2 



then we say that this estimator has a negative bias (it underestimates the true value!). 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

We also note that the estimator tends towards to an unbiased estimator of the variance (USV) 
when the number of items tends to infinity n — * +oo. We say that we have a "asymptotically 
unbiased" or "asymptotically unbiased estimator". 

It is important to note that we have yet proved that the empirical variance tends towards the 
theoretical variance when n tends to infinity and ... that the data follows or not a Normal 


An estimator is named "consistent estimator" if it converges in probability, when n — >■ 
+oo, towards the true parameter value. 

V / 

By the properties of the mean, we get: 

E(E(d 2 )) = E(d 2 ) = E ( - — -a 



We have then: 

a = 


-a 2 = 

71 — 1 \ n — In-, 

V i=l 

v 1 n I 1 n 

— - I2( x i - A) 2 = \ - — t _ A) 2 

n ~ 1 


simply called the "standard deviation" ... (that must not be confused with the "standard error" 
as we shall see later). 

So we finally summarize as following the two important previous results: 

1 . The "biased maximum likelihood estimator" or also named "empirical standard deviation" 
or "sample standard deviation" or "Pearson standard deviation" ... is therefore given by: 


when n — > +oo. We find this standard deviation depending on the context (by tradition) 
noted in five other ways that are: 

cr*, S*, a*, S'*, S n 


and sometimes (but this is very awkward because it often generates confusion with the 
unbiased estimator) a or S. 

2. The "unbiased maximum likelihood estimator" or simply named "standard deviation": 


a = 

- A ) 2 

i = 1 


which as we can see is a consistent estimator (when n tends to infinity it tends to the 
biased maximum likelihood estimator). 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

We find this standard deviation depending on the context (by tradition) noted in three 
other ways that are: 


We find these last two notations often in tables and in many softwares and we will use them 
later in the development of confidence intervals and hypothesis testing! 

For example, in the Microsoft Excel 1 1.8346 the unbiased estimator is given by the STDEV ( ) 
function and the non-biased by STDEVP ( ) . 

In total, this make us is three estimators for the same indicator! As in the overwhelming majority 
of cases of the industry the mean is not known, we usually use the last two relations bordered 
above. Now this is where comes the vicious part: when we calculate the bias of this two 
estimators, the first is biased, the second is not. So we tend to use only the latter. Nay! Because 
we could also talk about the variance and precision of an estimator, which are also important 
criteria for judging the quality of an estimator relative to another. If we were to calculate the 
variance of the two estimators, then the first, which is biased, is smaller than the second which 
is unbiased variance! All that to say that the criteria of bias is not (by far) the only one to be 
study to judge the quality of an estimator. 

Finally, it is important to remember that the factor —1 in the denominator of the unbiased 
maximum likelihood estimator stems from the need to correct the mean of the biased estimator 
initially subtracted by one time the standard error! 

7.7.2 Poisson Distribution MLE 

Using the same method as for the Normal (Gauss-Laplace) distribution, we will seek the maxi- 
mum likelihood estimators of the Poisson distribution which for recall is given by: 


Thus, the likelihood is given by 



Maximize a function or maximize its logarithm is equivalent therefore: 



In [L{n)] = ln(/i) ln(x;!) - /in 


i = 1 i = 1 

We are now looking to maximize it: 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

and thus we obtain the only maximum likelihood estimator that will be: 


It is quite normal to find in this example the sample mean because it is the best possible estimator 
for the parameter of the Poisson distribution (which also represents the mean of a Poisson 

Knowing that the standard deviation of this particular distribution (see above during the devel- 
opment of the Poisson distribution) is the square root of the mean, then we have for the standard 
deviation maximum likelihood : 



We show in the same way identical results for the exponential distribution that is widely 
used in preventive maintenance and reliability! 

7.7.3 Binomial (and Geometric) Distribution MLE 

Using the same method as for the Normal distribution (Gauss-Laplace) and the Poisson dis- 
tribution, we will seek the maximum likelihood estimator of the Binomial which we recall, is 
given by: 

w *> = ■ ■ p = = kw^r / (1 - P)N ■* (7 ' 795) 

Accordingly, the likelihood is given by: 


L(p) = \[P{xi,p) = C£p k (l-p) N ~ k 



It should be remembered that the factor following the combinatorial term already expressed the 
successive variables according to what we saw during our study of the Bernoulli and Binomial 
distribution functions. Hence the disappearance of the product in the preceding equality. 

Maximize a function or maximize its logarithm is equivalent therefore: 

ln[L(p)] = ln(C^) + kln(p) + (N — k) ln(l — p) 

We are now looking to maximize it: 

3 In [L(jp)\ k N — k 

dp p 1 — p 



info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

The reader may have perhaps noticed that the binomial coefficient has disappeared. There- 
fore, we immediately deduce that the estimator of the binomial distribution is the same as the 
geometric distribution. 

Which gives: 

k( 1 — p) — p(N — k) = k — kp — pN + pk = k — pN = 0 (7.799) 

from which we derive the maximum likelihood estimator: 


This result is quite intuitive if we consider the classic example of a coin that has a chance on 
tow of dropping on one of its faces. The probability p being the number of times k a given face 
where was observed in the total number of tests (all sides combined). 


In practice, it is not as easy to apply these estimators ! We must carefully consider which 
are most suitable for a given experiment and ideally also calculate the mean squared error 
(standard error) of each of the estimators of the mean (as we have already done for the 
empirical mean earlier). In short... it is a long process of reflection. 

7.7.4 Weibull Distribution MLE 

We saw in the section of Industrial Engineering a very detailed study of the three-parameter 
Weibull distribution with its standard deviation and mean because as we mentioned it is quite 
used in the field of reliability engineering. 

Unfortunately, the three parameters of this distribution are unknown in practice. Using estima- 
tors however we can determine the expression of two of the three assuming 7 as zero. This 
gives us the following Weibull distribution named "Weibull distribution with two parameters": 

P 0, P, rj) = 


n vn 

and for recall with p > 0 and // > 0. 

v) = n & v) = n - ( — 


=1 V W 



P -I 

n / — — 



V/ t= iV7 

V 77, ' 

e 1=1 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

Maximize a function or maximize its logarithm is equivalent therefore: 

H L (P,v)) = In 

— n In — 

— n In — 



5>? +E ln 

2= 1 2=1 

7 (3j2 x i + (P- !)£ ln 

2=1 2=1 


Now we seek to maximize this by remembering that (see section Differential And Integral 

-j-a x = a x ln(a) and -^-a x = —a x ln(a) 

dx da: 


dln(P,r]) 1 l ^ p l ^ p ^ Xi 

—sF~ = n ~p + ^ hl( " ) g ^ I>< Hx,) + S ” 7 

1 1 

= ^ + ^f 

1 1 

J2 x i (ln(?/) -h i(xi)) 
. 2=1 

Y, x i ( ln (-A) - ln(ry)) 

11 rp . 

+ £ln- 

U. 7 

. 2=1 

V' In — 

h 7 


And we get for the second parameter: 

1 1 ^ p , 3;* y" i Xi n 

= n— > x. In k + > In — = 0 

^ 7 § 7 

<91n(/3, 77 ) 

= —n 



V T]^T] “ 

+ (1 ~/3)n- = 0 




y>f-n = 0 






Finally to resume with the correct notations (and in the resolution order in practice): 

,4 m 1 _ n 1 YA„4 

1 1 n A 'T- n T- 1 

n , ^ of In In = 0 and — n = 0 

P i = 1 7 i=1 7 t)4 i=1 


Solving these equations involves heavy computations and we can a priori do nothing with that 
in conventional spreadsheets softwares such as Microsoft Excel or Open Office Calc without 
programming (at least as far as we know...). 

We then take a different approach by writing our Weibull distribution with two parameters as 

x 13 

P{x,P,e) = ^x p - 1 e 9 (7.809) 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

with for recall (3 > 0 and 6 > 0. 
Therefore the likelihood is given by: 




X 7 V' ILL 

n no __L / a\ n ~ n n 

L(P,v) = n P(xi,(3,9) = n ^x? _1 e o =(-) e * =1 

i = 1 i=l " \“ J i=\ 

Maximize a function or maximize its logarithm is equivalent therefore: 

n xf 


H L (P,v)) = In 

o tt 4-1 
— I e t=i 11 

= nln 0 “ Q Y. x i + £ ln (< ) = win -Q - Q Y, x i + (P “ !) E ln ( 

1 ~ 1 2=1 

2=1 2=1 



Now we seek to maximize this by remembering that (see section Differential And Integral 


-^-a x — a x ln(a) and —a x =£—a x ln(a) 
dx dx 

dln(P,e) n l ^ p ^ 

= ln (^) + £ ln (^) = 0 

dp p e 

And we have for the second parameter: 


3 ln(/J,«) n , 1 ^ „ „ 

— m ~ = -J + pV x ‘ =0 


It is then immediate that: 




injected into the equation: 

x ; 



r . 1 n n 

ln (^) + £ H X i) = 0 

P V i = 1 

i= 1 



We get: 




J2 x iH x i) 





+ ^ln(xj) = 0 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


i = 1 







E ln O*) 


The resolution of the two equations (in order from top to bottom): 



can easily be calculated with the Target Tool of Microsoft Excel or Open Office Calc. 

7.7.5 Gamma Distribution MLE 

Here we will use a technique named "method of moments" to determine the estimators of the 
parameters of the Gamma distribution. 

Suppose that X {j ..., X n are independent and identically distributed random variables according 
to the Gamma distribution with density: 

1 p All 

Pa,x(x) = - } A Q l [0 , +oo[ (7.820) 

We seek to estimate a, A. For this, we first determine some theoretical moments. The first 
moment is the expected mean that as we have proved before is given by: 

E(3f) = m 1 = a - (7.821) 


and the second moment, the mean of the square of the random variable, is as we have implicitly 
proved in the proof of the variance of the Gamma distribution given by: 

E(X 2 ) = m 2 = 

a (a + 1) 


We then express the relation between the parameters and the theoretical moments: 


mi= A 

m 2 = 

a (a + 1) 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

The resolution of this simple system gives: 


a = 

A = 

777,2 — 777 ^ 

7772 — ITT'l 



Once this system established, the method of moments consist to use the empirical moments, i.e. 
for our example the first two, mi, m 2 : 

AA + ... + X n 

, (7.825) 

mi = 


m 2 = 

+ ... + A, 


that we define as equal to the true theoretical moments ... Therefore, it comes: 


a = 


A = 

m 2 — 777 1 
7772 — 777 ^ 


7.8 Finite Population Correction Factor 

Now we prove another result which we will be required in some statistical tests that we will see 

Suppose we have a population of A individuals that we we represent by the set {1,2, ..., N} and 
a random variable X which is an application of {1, 2, ..., N} in R. We denote by Xi = X (i). 
The mean of X is thus given by: 

1 N 

( 7 - 827 ) 

Remember the variance of X is by definition: 

1 N 



Now we consider the set E of samples of size n taken in {1, 2, ..., N} with 0 < 77 < N. Each 
individual has a probability of being drawn equal to: 

11 1 _ (N - 77 )! 

NN-1 ' ' N ~(7i- 1) = IVF 


We are interested in the random variable X defined on E and that equal to the sample mean. 
More specifically: 

1 n 

X(ii, ..., in) = -J2 X ik 

n k = 1 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

To calculate the variance V(A), we will X express as a sum of random variables. Indeed, if we 
define the variables A"/,, with k = 1...N by: 

Xk (a j v) 

x k if k G {ii, 

0 otherwise 



We have naturally by the previous definition (see with caution the sum limits!): 



x = -J2x k 

n t: i 


and thus we get: 



v(*) = z = r I E V (W) + E««W.^ 

n n 


. k= 1 

The random variables X*, are not independent in pairs, in fact as we shall see, their covariances 
are not zero if N is finite. Otherwise (zero covariance), we find a result already proved earlier: 


V(A) = (Tx = — (7-835) 


So we need to calculate the variances V(A) and covariances cov(Aj, Xj). 

For this purpose we will use the Huyghens relation and we will start by calculating the mean 

14 AY): 

E(X k ) = P( X k = x k )x k (7.836) 

But P(X k = x k ) is the probability that a sample contains k. This probability is obviously equal 
ion/ N and therefore: 


E(A fc ) = P(X k = x k )x k = —x k (7.837) 

Similarly we obtain: 

E (Xj*) = P(X k = x k )x k = —x 2 k (7.838) 

We can therefore calculate the variance Y(X k ): 

V(X„) = E(X 2 t ) - E(A' fc ) 2 = ^ 4 - x k ) 2 = n{N N ~ n) 4 (7.839) 

To calculate the covariances we need now to calculate the means E(AjA^): 

E(X t Xj) = P(Xi = Xi,Xj = Xj)xiXj (7.840) 

But P(Xi = Xj, Xj = Xj) is the probability that a sample contains i and j. This probability is 
obviously given by: 

n n — 1 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

and therefore: 

E(X. l X J ) 

n{n — 1) 
N{N-lf iXj 


We can now compute the covariance: 

co y{Xi,Xj) = E (XiXj) - E(X i )E(X i ) 

n(n — 1) n 2 

rp . rp . rp . rp . 

N(N-l) 1 3 N* ' 3 ~ 

n(N — n ) 

' N 2 (N -l) XiXj 


We are now able to simplify V(2f): 
V(X) = 

2 / N N 


E v ™ + E cov^x,, 

fc=l i£j 


n(N — n ) 


E 4 - 

k= 1 

7i(iV — n) 

N 2 (yN 1) 


E x <- 




Using Huyghens theorem we get: 

V(**) = = E (X 2 ) - E(X fc ) 2 = E (X 2 k ) - /r 2 = E(X 2 ) - /r 2 ^ a 2 + /r 2 = E(X 2 ) 


Using the result proved above and previous relation: 

E(X 2 ) = | x 2 = a 2 + /r 2 (7.846) 


9 N 9 N 9 JL 9 N 2 , 9 9N 

x k = —° +-V ** E = — i a + ^ ) (7.847) 

u u k = i 


For the double sum x,xy, we have: 

*7 U 


J 2 x i x i 



Ex*(xi + ••• + ^i-i + 27+1 + ••• + xjv) = E^(^ 

Z=1 Z=1 


N N N N 

N^J2xi-J2 x i = NfiNfi - X 2 = N V - E :r i 

Z=1 Z=1 Z=1 Z=1 

ivy - — (a 2 + a 2 ) 

n v 7 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


V(X) = 





n(N - n) N 2 

N 2 


a 2 + /i 2 )- 

n(N — n) / Ar2 9 N 2 , 2 2N 

= i ((N-n) (ff 2 + Jt 2 ) - 
n \ v 

AT- 1 

n(AT — n) 

iV-1 ^ + 1 + A^-1 

N — n 9 N — n 9 

/i 2 

N — 71 

N - 1 

= n i (JV “ + 

1 ( f -\t \ 2 ^2 2 

= n( (A, “’ l)<r +AT3T + " 

1 / 9 N — n 9 2 (N — ny 

= nV N ~ n)a> + W^N + f-W=T 

N — n n(N — n ) 

AT- 1 

+ (A r — n) 

-raAT + n 2 + ./V 2 — nA r \ 

AT- 1 j 

1 f N(N-n) 2 2 (iV-rc) s 
n \ AT-1 ^ +/i AT-1 

= ..???.. = ( 7 2 

N — n 
n(N — 1) 



The famous factor: 

fpc= \ - 1 


that we have already encountered during our study the hypergeometric distribution is named 
"finite correction factor (on finite population)" and has the effect of reducing the standard error 
especially as n is large. 

7.9 Confidence Intervals 

Until now we have always determined the likelihood estimators or simple estimators (variance, 
standard deviation) from theoretical statistics distributions or measured on an entire population 
of data. 

Definition (#97): A " confidence interval" is a pair of numbers that defines (a posteriori) the 
range of possible values with a certain cumulative probability of an (punctual) estimator of a 
given statistical indicator form a sample of an experience (the range of the statistical indicator 
being usually calculated using real measured parameters). It is the most common statistical 

We now turn to the task that consists naturally to ask ourselves what must be the sample sizes 
of our measured data to have some validity (C.I.: confidence interval) for our estimators or even 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

to which confidence interval correspond a given standard deviation or quantile in a Normal cen- 
tered reduced distribution (for large samples), in a chi-square distribution, Student distribution 
or Fisher distribution (we will see the last two cases of small sample sizes in the section on 
analysis of variance or ANOVA) when the man or variance are known or unknown respectively 
on all or part of the given population. 

It is important to know that these confidence intervals often use the central limit theorem that 
will be proved late (to avoid any possible frustration) and the developments that we will do now 
are also useful in the field of (a posteriori) Hypotheses Tests that have a major role in statistics 
and therefore indirectly in all fields of science!!! 

Finally, it could be useful to indicate that a large numbers of organizations (private or institu- 
tional) make false statistics because the assumptions and conditions of use of these confidence 
intervals (verbatim hypotheses tests) are not rigorously verified or simply omitted or worse, the 
whole base (measurements) is not collected in the rules of art (reliability of the data collection 
and reproductibility protocols not validated by scientific peer). 

The reader must also know that we have put many other confidence interval techniques detailed 
proofs related for example for regression techniques in the section of Theoretical Computing. 


The practitioner should be very careful about the calculation of confidence intervals and 
the use of hypothesis testing in practice. This is why, to avoid trivial usage error or 
interpretation, it is important to refer to the following international standards eg: ISO 
2602: 1980 •, ISO 2854: 1976 (Statistical interpretation of data - Techniques of estimation 
and tests relating to means and variances), ISO 3301:1975 (Statistical interpretation of 
data - Comparison of two means in the case of paired observations), ISO 3494: 1976 (Sta- 
tistical interpretation of data - Effectiveness of tests relating to means and variances ), ISO 
5479:1997 (Statistical interpretation of data - Tests for departure from the normal distri- 
bution ), ISO 10725:2000 + ISO 11648-1:2003 + ISO 11648-2:2001 (Sampling plans 
and procedures for acceptance for control of bulk materials), ISO 11453:1996 (Statisti- 
cal interpretation of data - Tests and confidence intervals relating to proportions), ISO 
16269-4:2010 (Statistical interpretation of data - Detection and treatment of outliers), 
ISO 16269-6:2005 (Statistical interpretation of data - Determination of statistical toler- 
ance intervals), ISO 16269-8:2004 (Statistical Interpretation of data - Determination of 
prediction intervals), ISO / TR 18532:2009 (Guidelines for the application of statistical 
quality and industrial standards ). 

V / 

7.9.1 C.I. on the Mean with known Variance 

Let’s start with the simplest and most common case that is the determination of the number of 
individuals to have some confidence in the average of the measurements of a random variable 
assumed to follow a Normal distribution. 

First let us recall that we showed at the beginning of this chapter that the standard error (standard 
deviation of the mean) was under the assumptions of independent and identically distributed 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

variables (i.i.d.): 


Now, before we go any further, consider X as a random variable following a Normal distribution 
with mean // and standard deviation cr. We would like that the random variable has for example 
95% cumulative probability of being in a given bounded symmetric interval. Which is therefore 
expressed as follows: 

P(/i — S^X^fjb + S) — 0.95 



Therefore with a confidence interval of 95% you will be right a posteriori 19 times out 
of 20, or any other level of confidence or risk level a (1-confidence level, 5%) that you 
will be set up in advance. On average, your conclusions will therefore be good, but we 
can never know whether a particular decision is good! If the risk level is very low but 
the event still occurs, specialists then speak about a "large deviation" or a "black swan". 
Management of outliers is addressed in ISO 16269-4:2010 Detection and treatment of 
outliers that any engineer doing business statistics has to follow. 

V I 1 

By centering and reducing the random variable: 

P <c X ~ 11 ^ = 0.95 (7.854) 

V CT cr cry 

Let us now write Y the reduced centered variable: 

P ^y\-P (V °- 95 (7.855) 

Since the Normal centered reduced distribution is symmetric: 

1 - 2P ( Y ^ - ) = 0.95 



P ( Y ^ - 



From there reading statistical tables of the standard Normal distribution (or by using a simple 
spreadsheet software), we have to satisfy the equality that: 


- 9* 1.96 5 = 1.96a (7.858) 


Which can easily be obtained with Microsoft Excel 11.8346 by using the function: 

=-N0RMSINV ( (1-0 . 95) /2) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

As noted in the traditional way in the general case other than the 95% one (Z is the random 
variable corresponding to the half quantile of the chosen threshold of the standard Normal 

b^Zo (7.859) 

Now, consider that the variable X on which we wish to make statistical inference is the av- 
erage (and we show later that it follows a Normal distribution centered reduced distribution). 

5 ^ Za 

Then we get: 

Z 2 a 2 

5 2 



from which we obviously take (normally...) the upper integer value... The latter notation is 
usually written in the following way highlighting better the width of the confidence interval of 
an underlying threshold level: 


Relation named "sample size estimation by Normal distribution". 

Thus, we now know the number of individuals we must have to ensure to get a given precision 
interval b (margin of error) around the mean and that for a given percentage measures are in 
this range and assuming the theoretical standard deviation a is known (or imposed) in advance 
(typically used in quality engineering or surveys). 

In other words, we can calculate the number n of individuals to measure to ensure a given con- 
fidence interval (associated to the quantile Z) of the measured average assuming the theoretical 
standard deviation known (or imposed) and wishing a precision b in absolute value of the mean. 

However ... in reality, the variable Z comes from the central limit theorem (see below) that 
gives for a large sample size (approximately): 

X- ix 


Rearranging we get the: 

li = X- 

and as Z can be negative or positive then it is more logic to write this as: 

fi = X±Z^= 
a Jn 




info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


a - a 

X -Z—= + 

y/n a Jn 

That engineers sometimes write: 

LCL < fi < UCL 



where LCL is the lower confidence limit and UCL the upper confidence limit. This is the Six 
Sigma terminology (see section Industrial Engineering). 

And we have seen earlier that for a confidence interval of 95% we have Z = 1.96. And since 
the Normal distribution is symmetric: 

95% = 1 - 5% = 1 - a = 1 - 2 x 2.5% = 1 - 2 x 0.025 (7.868) 

Thus we finally write the "one sample Z test": 

X — Z a /2 

^ H ^ X + Z a j 2 


where we define for all tests having the same structure, the "margin error" by: 

ME = Z a/ 2 °= (7.870) 


As we have already mentioned, and we will prove a little further, the arithmetic reduced centered 
mean of a series of independent and identically distributed random variables with finite variance 
asymptotically follows a standard Normal distribution, this is why the confidence interval above 
is very general! This is why we sometimes speak of "asymptotic confidence interval of the 

These intervals obviously have for origin the fact that we work very often in statistics with 
samples and not the entire available population. The selected sampling thus affects the value of 
the punctual estimator. We then speak of "sampling fluctuation". 

In the particular case of an IC (confidence interval) at 95%, the last relation will be written: 

X — Zo.025 1= ^ H ^ X + Zq q 25 a= (7.871) 

y/n y/Tl 

Sometimes we find the prior-previous inequality in the following equivalent notation: 

X — Z a / 2 (7 X A 4 ^ X + Z a /2(Jx 


or more rarely with the following general notation (for all intervals): 

X -ME</i<X + ME (7.873) 

where ME stands for "margin of error". We are thus now able to estimate population sizes 
needed to obtain a certain level of confidence a in an outcome or to estimate the confidence 
interval in which is the theoretical mean knowing the experimental (empirical) average and 
the estimator maximum likelihood of the standard deviation. We can of course therefore also 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

determine the a posteriori probability that the mean is outside a given range ... (one as the other 
being widely used in the industry). 

Finally, note that from the previous result, we deduce immediately the stability property of the 
Normal distribution (shown above) the following test that we find in many statistical softwares: 


{X 2 — Xi) — (// 2 — A*i) 


named "bilateral Z test on the difference of two means" or also sometimes called "two sample 
Z test" a with the corresponding confidence interval: 





(X-2 — Xi) — Z a / 2 \ 1 — - < H 2 — /A < (X 2 — Xi) + Z a / 2 \ 1 






And this is not because two averages are significantly different that their confidence intervals do 
not overlap!!!! As shows the graph below obtained with Minitab 16 software where the tcst-Z 
of the difference is significant at 95%: 


* 97 
3 9.6 




Figure 4.108 - Line plot illustration of the overlay of two confidence with 95% confidence interval 

95% Cl’s for the Mean 

P-value < 0.05 


Group 1 Group 2 

while their mean is significantly different to a confidence level of 95%. 


The size of the parent population for the relations developed above does not come into 
consideration in the calculations of confidence intervals or even not in the sample size, 
and because it is considered as infinite. So be careful not sometimes not to have sample 
sizes that are larger than the actual parent population... 

V / 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

7.9.2 C.I. on the Variance with known Mean 

Let’s start by demonstrating a fundamental property of the Chi-square distribution: 

If a random variable X follows a Normal centered reduced distribution X = A/"(0, 1) then its 
square follows a chi-square distribution of 1 degree of freedom: 

X 2 = X 2 (1) (7.876) 

This result is sometimes named a "Wald statistics" and any statistical test using it directly (we 
should better speak about a "test family") can be designer under the name "Wald’s test" (for a 
concrete example see Cochran-Mantel-Haenszel test in the section of Theoretical Computing). 

Proof 4.33.5. To prove this property, it suffices to calculate the density of the random X 2 
variable with A" = A/"(0, 1). However, if X = Af(0, 1) and if we set Y = X 2 , then for all y > 0 
we get: 

P(Y <y) = P(X 2 ^y) = P{ -y/y ^ Y ^ y/y) (7.877) 

Since the standard Normal distribution is symmetric about 0 for the random variable X, we can 

P(Y < y) = 2P(0 ^XMWv) (7.878) 

Denoting by $ the cumulative distribution function of the standard normal distribution, we 

P(Y ^y) = 2$(y/y) - 2 x 0.5 = 2®{y/y) - 1 (7.879) 

$(0) = P(JV(0, 1) < 0) = 1_ [ e ~ k/2 d k = 0.5 (7.880) 

v2vr J 



P(Y <y) = 2$( v ^) - 2 • 0.5 = 2<D( V ^) - 1 (7.881) 

The cumulative distribution function of the random variable Y = X 2 is thus given by: 

P(Y <y) = 2$( v ^) - 1 (7.882) 

if y is greater than or equal to zero, null if y is less than zero. We will note this cumulative 
distribution f Y (y ) for the further calculations. 

Since the density distribution function is the derivative of the cumulative distribution function 
and X follows a Normal centered reduced distribution so we reduced for the random variable 

P(X = x) = &(x) = = —^=e~ x2 P (7.883) 

Q.5J "sj 27T 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

and then it follows for the probability distribution of Y (which is the square of X for reminder!): 

l y M«) = 4 (2t( ^) _ 1) = 2t'(^)lv5 = Yvw 

_1 1 c -v / 2 1 c -v / 2 

y/vV^rr VZiry 


this last expression corresponds is exactly the relation we obtained during our study of the 
chi-square distribution imposing a degree of freedom equal to the unit. 

The theorem is therefore proved, that is if X follows a Normal centered reduced distribution 
while its square follows a Chi-square distribution of 1 degree of freedom as: 

(AV(0,d) 2 = * 2 (1) 


□ Q.E.D. 

This type of relation is used mainly in industrial processes and their control (see section Indus- 
trial Engineering). 

Now let us open a parenthesis that is quite important in some linear regression software reports 
and especially in the curvature test for design of experiment (see section Industrial Engineering). 
Let us recall that we have: 

T = 

-L n — 

A/'(0, 1) 



And we have just prove above that: 

v 2 _ Xl 



T = 

-L n. — 

Tx!/ 1 = l x 2/ 1 

7 x 17 ^ V x ^ n 


And as we have also seen that: 

F = 

± n,m 



it follow that: 

T n — \[Fi, 


or more commonly in practice: 

Tl = F hr 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

We will now use a result proved during our study of the Gamma distribution. We have effec- 
tively seen that the sum of two random variables following a Gamma distribution also follows 
a Gamma distribution where the two parameters are added: 

X + Y = 7 (p + q, A) (7.892) 

As the Chi-square distribution is a special case of the Gamma distribution, the same result 

To be more precise, this is equivalent to say: If Xj , .... A"/,. arc random independent and identi- 
cally distributed (i.i.d.) variables J\f( 0, 1) then by extension of the above proof where we have 
shown that: 

(AV(0,d) 2 = X 2 (1) (7.893) 

and by the property of linearity of the Gamma distribution, then sum of their squares follows a 
chi-square distribution of degree k such that: 

x \k) = X i 2 + X 2 2 + ...+X 




Thus, the distribution of x' 2 of k degrees freedom is the probability distribution of the sum of 
squares of k Normal centered reduced variables linearly independent of each other. It is in 
fact the linearity property of the chi-square distribution (implicitly the linearity of the Gamma 

Now see another significant property of the chi-square distribution: If Xi, ..., X n are indepen- 
dent and identically distributed JV (//, a) (thus the same mean and the same standard and follow- 
ing a Normal distribution) random variables and if we write the maximum likelihood estimator 
variance by: 



£(*. - a ) 2 



then, the ratio of the random variable S 2 on the standard deviation assumed to be known for the 
entire population ("the true standard deviation" or "theoretical standard deviation"!) multiplied 
by the number of individuals n population follows a chi-square distribution of degree n such 

r , Q 2 

2 / \ 2 uo * 
X (n) = Xn = — 2 ~ 


This result is named the "Cochran theorem" or "Fisher-Cochran theorem" (in the particular 
case of Gaussian samples) and thus gives us a distribution for the empirical standard deviations 
(whose parent law is a Normal distribution!). 

Using the value of the standard deviation proved during our study of chi-square distribution we 


V( X 2 (n)) = 2 n 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

But n and a are imposed and are therefore considered as constants. We have therefore: 

n 2 2 

“ jV(S^) = V(x 2 H) = 2n =► V(S 2 ) = a* = a 4 - (7.898) 

And therefore we have an expression of the standard deviation of the empirical standard devia- 
tion if we know the standard deviation of the population: 

2 2 

0's, = O’ \/ - 
V n 

But we have prove during our study of estimators that: 

a = 




It follows: 

Cs* — 

n — 1 


~ a sl ~ 


a "\l~ = 

n V n 

I2(n — 1) 
n 2 





It follows therefore the sometimes important relation in the practice of the estimator of the 
standard deviation of ... the standard deviation: 


Recall that the parent population is said to be "infinite" if the sample selection with replacement 
or if the size N of the parent population is much higher than this of the sample of size n. 


Rl. In laboratories the Xi,...,X n can be seen as a class of individuals of the same 
product identically studied by different research teams with instruments of the same 
precision (standard deviation of the measure identically equal). 

R2. £*2 is the "inter-class variance" also called "explained variance". So it gives a 
measure of the variance occurring in different laboratories. 

V / 

What is interesting here is that from the calculation of the chi-square distribution and by know- 
ing n and the standard deviation a 2 it is possible to estimate the interclass variance (and also 
interclass standard deviation). 

To see that this latter property is a generalization of the basic relation: 

X 2 (n) = Xl + X 2 + ... + X 2 n (7.903) 

it suffices to see that the random variable nS 2 /a 2 is a sum of n squares of _A/”(0, 1) independent 
of each other. Indeed, recall that a centered reduced random variable (see our study of the 
Normal distribution) is given by: 

Y = (7.904) 



info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


n 1 
cr 2 n 

E( A '< - 

i = 1 


However, since the random variables Xi,...,X n are independent and identically distributed 
according to a Normal distribution, then the random variables: 

X\ — /i 

X n /T 


are also independent and identically distributed according to a Normal distribution but a cen- 
tered reduced one. 


nA 2 


= x 2 H 


rearranging we get: 

cr 2 = 


X 2 (n) 


So on the population of measurements, the true standard deviation follows the relation given 
above. It is therefore feasible to make statistical inference on the standard deviation When the 
theoretical mean is known (...). 

Since the chi-square distributions is not symmetric, the only way to make this inference is to 
use numerical calculations and then we denote the confidence interval at the level of 95% (for 
example ...) as follows: 

nS * ^ J2. ^ nS * 

2 2 ( \ 
X2.5 %\ U ) *97.5% W 

Either by writing 95% = 1 — a: 

nSl 2 nSl 
Xl/ 2 (n) " ^ Xl- a/2 (n) 



the denominator being obviously the quantile of the chi-2 distribution. This relation is rarely 
used in practice as the theoretical average (mean) is not known. In order to avoid confusion, the 
latter relation is often denoted as follows: 


Xl /2 , n 




X\—a/ 2t n 


Let’s see the most common case: 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

7.9.3 C.I. on the Variance with empirical Mean 

Let us now make statistical inference when the theoretical average of population (i.e. the mean) 
is not known. To do this, consider now the sum of: 

X 2 (n) = J2 

X'i fi 

i=i \ a / w i = i “ i = i 

where for recall is the empirical average (arithmetic mean) of the sample: 


in i n 

Z(X< - M)' 2 =~ 2 E (W -X) + {X- v)f (7.912) 

X = -£* 



i= 1 

Continuing the development we have: 

x 2 H = -=■ 


E(^ - ^) 2 + 2(X - fj.) E(^ -X)+ n(X - fif 

i = 1 



However, we have proved earlier in this chapter that the sum of the deviations from the mean 
was zero. So: 

X 2 {n) = \ 

<7 Z 



EPQ - X) 2 + 2{X - y) • 0 + n(X - 
. 2=1 

E(*i-*) 2 + n(*-A*) 2 

. 2=1 


_ (X 4 -X) 2 n{X - fi) 2 _ " (Xi - X) 2 fX-fi 

2-^1 _2 ^-2 

i = 1 





<y“ \ n 

and by taking back the unbiased estimator of the Normal distribution (we change notation to 
respect the traditions and differentiate the empirical average of the theoretical mean): 

= — zix.-xf 


X 2 (n) = E 

n ~ 1 ti 

(X,,-X) 2 (X-ixX _ (n-l)S 2 fX -/i 



cr z 

+ V/VS) 

a z 




or with another common notation: 

X 2 H = 

{n - 1)S 2 , / M n — n 



a- \/n 


Since the second term (squared) follows a Normal centered reduced distribution too, so if we 
remove it we get by the proof made above about the chi-square distribution following property: 

X 2 (n - 1) = 


1 )S 2 




info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

These developments allow us this time to also make inferences about the variance of a A/(0, 1) 
distribution when the parameters /i and a of the parent population are both unknown. It is this 
result that gives us, for example, the confidence interval: 

' (n-l)S 2 ^ ^ (n-l)S 2 

^ Xl-a/ 2 ( n ~ 1 ) 


when the theoretical average (mean) /j is unknown. And also to avoid any confusion, it is more 
usual to write: 

(n - 1)S 2 „ , „ (n - 1 )S 2 

2 ^ ® ^ 2 
Xa/2,n— 1 Xl-a/2,n-l 


In the same way as above, we can calculate the standard deviation of the standard deviation that 
has a great importance in the practice of finance: 



[n~l)S 2 \ 

<x 2 J ' 

‘ ^ 2 -V(5 2 ) = 

V(S 2 ) 

2a 4 
n — 1 

: v(xLi 

2 (n - 1) 


info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

7.9.4 C.I. on the Mean with known unbiased Variance 

We have proved much higher that the Student distribution came from the following relation: 


T(k) = 


if Z and U are independent random variables and if Z follows a Normal centered reduced 
distribution A/"(0, 1) and U a chi-square distribution x 2 (&) as: 

frit) - 

n+ 1 

k + 1 

1 + - 


Y[n/2)\fkn V n 

and remember that its density function is symmetrical! 

Here is a very important application of the above result: 

Suppose that X \, ..., X n is a random sample of size n from a distribution a). So we can 
already write that following developments made above: 

Z = 


<7 / \ n 


And for U that follows a 2 ( /c ) distribution, then if we ask that k = n — 1 then according to the 
results above: 

(n-l)S' 2 9 

U = f = X (n - 1) 


We then get after some trivial simplifications: 

X - n X-fi 

W n 

a Wn 

X- ii 

1 / y/n X — n 



/(n-l)5 2 

(n - 1) 

VS* S/^n 

So since: 

T(k) = 




follows a Student distribution with parameter k then we get the "independent one-sample t- test" 
or simply calld "one-sample f-test": 



T(n — 1) 


which also follows a Student distribution of parameter n — 1 and is widely used in laboratories 
for calibration testing. 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

This gives us also after rearrangement: 

H = X- ~^=T(n - 1) (7.930) 


This allows us to make inference about the mean /x of a Normal distribution with the theoretical 
standard deviation being unknown (meaning that there is not enough experimental values) but 
where the unbiased estimator of the standard deviation is known. It is this result that gives us 
the confidence interval: 

X - 

—j=T a/2 (n - 1 ) < /x < X + 

\/ 77 . 

T Q/2 {n - 1) 


where we see the same factors as for the statistical inference on the average (mean) of a (theo- 
retical) random with know standard deviation as the Student distribution is asymptotically equal 
to the Normal distribution for large values of n. Thus, the previous interval and the following 

X — Z a/2 ^ ^ n ^ X + Z a / 2 —j= (7.932) 

y/n y/n 

givers very similar values (to three decimal places) for values of n at around 10, 000 (in practice 
we consider that for 100 this is the same...). 

We immediately deduce by the stability property of the chi-square distribution (proved above in 
that this property arises from the Gamma distribution) the following test that we find in many 
statistical software: 

(X 2 - XJ - T a/2 (n - 2),p + ^ < A x 2 - /Xi < (X 2 - X x ) + T a/2 (n -2 V ^ -3 

V ni n 2 V ni n 2 


named "bilateral t (Student) test on the difference of two means" or more simply "two sample 

We can of course therefore also determine the probability that the mean is inside or outside a 
certain range ... (the both case being widely used in industry). 

The reader can for fun control with Microsoft Excel 11.8346 that for a large number of mea- 
surements n, the Student distribution tends to the Normal centered reduced distribution by com- 
paring the values of the two functions below: 

=T . INV ( 5%/ 2 , n- 1 ) 
=N0RM . S . INV (5°/ 0 /2) 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 


The previous result was obtained by William S. Gosset around 1910. Gosset who had 
studied mathematics and chemistry, worked as a statistician for the Guinness brewery 
in England. At that time, we knew that if A b ..., X n are independent and identically 
distributed random variables then: 

^-^=^(0,1) (7.934) 


However, in statistical applications we were rather obviously interested in the following 




We then merely assume that this amount followed almost a Normal centered reduced 
distribution, which was not a bad approximation as can show the image below (d / = 
n — 1): 

Figure 4.109 - Comparison between the Normal and the Student distribution functions 

After numerous simulations, Gosset came to the conclusion that this approximation was 
valid only when n is large enough (so that gave him the indication that there must be 
somewhere behind the central limit theorem). He decided to determine the origin of the 
distribution and after completing a course in statistics with Karl Pearson he obtained his 
famous result that he published under the pseudonym Student. Thus, is why we call 
Student distribution that law that should have been called the Gosset distribution. 

Finally, note that the Student’s T -test is also used to identify whether changes (increasing or 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

vice versa) in the average of two identical populations are statistically significant. That is to 
say, if the size of two dependent samples is the same then we can create the following test (we 
included all different types of writing that can be found in the literature and in many software 
implementing this test): 

T(n - 1) 

(X 2 - Xj) - 5 0 




^ = ( 7 -937) 

The prior-previous relation is very useful for comparing the same sample twice in different 
measurement situations (sales before or after a discount on an article for example). This prior- 
previous relation is called "t - test (Student) averages two paired samples (or dependent samples)" 
or more simply "paired sample t- test". 

Definition (#98): We speak of "paired samples" if the sample values are taken 2 times on the 
same individuals (i.e. the values of the pairs are not independent, unlike two samples taken 

7.9.5 Binomial exact Test 

Often when measuring we want to compare two small samples taken randomly (without replace- 
ment!) from also a small population ... to know if they are statistically significantly different or 
not as when we were expecting a perfect equality! 

We are looking for a suitable test for the following cases: 

• To know if the sample of a population prefers to use a given technical method of work 
rather than another when we expect that the population does not prefer one of the other 

• To know if the sample of a population has a predominant characteristic among two pos- 
sibilities when we expect that the population is well balanced 

Before going further into details, let us remind that we must be extremely cautious about how 
to get the two samples. The experience must be unbiased, this is to say for reminder, that the 
sampling protocol must not favor one of the both characteristics of the population (if you study 
the balance between man/woman in a population by attracting people for the survey with a gift 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

in the form of jewelry or just by calling during the workdays you will have a biased sample ... 
because you’ll probably naturally have more women than men...). 

This said, this situation match with a binomial distribution for which we proved earlier in this 
chapter that the probability of k successes in a population of size N with a probability of success 
is p (probability of failure q being therefore 1 — p) was given by the relation: 

P(N, k ) = C^p k q N ~ k = Cf/(1 - p) N ~ k (7.938) 

In the case before we are interesting we have p = q = 0.5: 

P(N, k ) = Cf 0.5 fc 0.5 fc = 0 .5 N Cg (7.939) 

while remembering that the distribution will not be symmetrical and especially if the population 
size N is small. 

If we now denote by x the number of successes (considered as the size of the first sample) and 
y is the number of failures (considered as the size of the second sample), then we have: 

P(N, k ) = 0.5 N C? = P{x + y,k) — Q.5 x+y C% +y (7.940) 

This being done, to build the test and by the asymmetry of the distribution, we will calculate 
the cumulative probability that k is smaller than the x obtained by the experience and sum it 
to the cumulative probability that k is greater than the y obtained by the experiment (which 
corresponds to a cumulative probability of respectively left and right tails of the distribution). 
So this sum corresponds to the probability: 

x N 

P = 0.5^ '£($ + 0.5 N C k (7-941) 

k = 0 k=y 

and this last relation is called "binomial exact test (two-tailed)". 

If the probability P obtained by the sum is above a certain cumulative probability fixed in ad- 
vance, then we say that the difference with a random sample in a perfectly balanced population 
is not statistically significant (bilaterally ...) and respectively if it is below, the difference will 
be statistically significant and therefore we reject the assumed equilibrium. 

Therefore, if: 

x 1 N f x' N 

0.5" £ C" + 0.5" £ C" = 0.5" £ Cf + £ Cf 

k = 0 k—y' \k = 0 k=y' 

the difference with a balanced population will be considered not statistically significant. Of- 
ten we will a to be at the maximum equal to 5% (but rarely below) which corresponds to a 
confidence interval of 95%. 

> a (7.942) 

Unfortunately from a statistical software to the other the required parameters or results will not 
necessarily be the same (spreadsheets softwares for example do not include a specific function 
for the binomial test, will often have to build a table or develop yourself a function). For 
example, some software automatically calculate and impose (which is quite logical in a sense...) 

( x-l N \ 

£ C t " + £ cn=p (7.943) 

k = 0 k=y -\- 1 J 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 


From a small population having two particular characteristics x and y that interest us 
and which we expect to have a perfect balance but as x = y we actually got x = 5 and 
y = 7. We would l ik e do the calculation with Microsoft Excel 11.8346 to know whether 
this difference is statistically significant or not at a level of 5%? 

So, to answer this question, we will calculate the cumulative probability: 

0.5^ ( £ C? + E cA = 0.5 12 f E C? + E (7.944) 

\ K k=0 k=y J \k = 0 k= 7 / 

which gives us: 


B i 


Binomial CoeflF. 



























Figure 4.110 - Calculated values of the binomial coefficients in Microsoft Excel 11.8346 

thus explicitly: 


B 1 



Binomial Coefif. 



=COMBIN( 1 2,A2) 



=COMBIN( 1 2, A3 ) 









=COMBIN( 1 2,A6) 



=COMBIN( 1 2,A7) 



=COMBIN( 1 2, A8) 



=COMBIN( 1 2, A9) 



=COMBIN( 1 2, A1 0) 



=COMBIN(12,Al 1) 






=COMBIN( 1 2, A1 3 ) 


=0.5 A 12 H ‘SUM(B2:B13) 

Figure 4.111 - Formulas for calculating binomial coefficients in Microsoft Excel 11.8346 

thus the cumulative probability being 0.774 (i.e. 77.4%) the difference compared with 
balanced population will be considered as not statistically significant. 


This test is also used by most statistical software (such as Minitab) to give a confidence 
interval of the conformity of opinions in relation to that of an expert. This is what we call 
an R& R study (reproductability& repeatability) by attributes (see my book on Minitab 
for an example). 

v y 

info @ sciences. ch 


4. Arithmetic 

EAME v3. 5-2013 

7.9.6 C.I. for a Proportion 

For information some statisticians use the fact that the Normal distribution arises from the Pois- 
son distribution which itself derives from the binomial distribution (we have proved it when n 
tends to infinity and p and q are of the same order) to build a confidence interval in the context 
of the analysis of proportions (widely used in the analysis of the quality in the industry). 

To see this, we note X % the random variable defined by: 



1 if the i element of the sample has the attribute A 
0 otherwise 

where the attribute A can be the property "defective" or "non-defective" for example, in an 
analysis of pieces. We note by K the number of successes of the attribute A. 

The random variable X — X\ + X 2 + ... + X n we have proved it earlier in this chapter, follows 
a binomial distribution with parameters n and p with the moments: 

/i = E(X) = np 

a = sJX(X) = npq = y/np(l - p) 


That said, we do not know the true value of p. We will use the estimator of the binomial 
distribution proved above: 

„ k X 

P = ~ = ~ 
n n 

Based on the properties of the mean we have then: 

E (p) = Pp = E f—) = — = p 

\ n } n 



And by using the properties of the variance, we have following relation for the variance of the 
sample mean of the proportion: 

V (P) = = v ( — 

v \n 

V(X) np(l — p) p(l — p) 

n * 




This then brings us to: 

and (7.951) 

Finally, remember that we have proved that the normal distribution resulted from the binomial 
distribution under certain conditions (practitioners admit that it is applicable as n > 50 and 
np > 5). In other words, the random variable X following a binomial distribution follows a 
Normal distribution under certain conditions. Obviously, if X follows a Normal distribution 


info @ sciences. ch 

EAME v3. 5-2013 

4. Arithmetic 

then X/n also (and so do p...). Therefore we can center and reduce p so that it behaves as the 
reduced Normal centered random variable denoted by Z\ 


p — p 

P( 1 - P ) 




El. If 5% of the annual production of a business fails, what is the probability 
that by taking a sample of 75 pieces of the production line only 2% or less will be 

We therefore have: 


0.05 • 0.95 



The corresponding cumulative probability to that value can be easily obtained with Mi- 
crosoft Excel 11.8346: 

=N0RMSDIST(-1 . 19) =11 . 66% 

But note that we do not have np > 5 that is satisfied therefore we could exclude to use 
this result. 

E2. In its report from 1998, JP Morgan explained that during the year 1998 its losses 
went beyond the Value at Risk (see section Economy) 20 days on 252 working days of 
the year based on a 95% temporal VaR (thus 5% of working days considered as loss). At 
the threshold of 95% it is just bad luck or is that the VaR model used was bad? 

p — p 

np — np 

P(1 ~P) y/PiX ~ P)n 


-0.05 • 252 - 20 
0.05(1 - 0.05)252 

^ -2.14 < -1.96 


So it was just bad luck. 

We can now approximate the confidence interval for the prop